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abstract 


This  thesis  describes  issues  involved  in  node  allocation  for  a  Large  Grain  Data  Flow 
(LGDF)  model  used  in  Navy  signal  processing  i^licalions.  In  die  model  studiedt  nodes 
are  assigned  to  processors  based  on  load  balancing,  communication  /  computation  oveilqi, 
and  memory  module  contention.  Current  models  '  the  Revolving  Cylinder  (RC) 
technique  for  LGDF  graph  analysis  do  not  adequately  adft<<v:ss  node  allocation.  Thus,  a 
node  to  processor  allocation  component  is  added  to  a  computet  simulator  of  an  LGDF 
graph  model.  It  is  dememstrated  diat  die  RC  technique,  when  proper  node  allocation  is 
taken  into  account,  can  improve  overall  diroughpnt  as  cmnpared  to  the  firs:-come-firs*> 
served  (PCFS)  technique  for  high  communication  /  computation  costs. 
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I.  INTRODUCTION 


The  Revolving  Cylincter  (RC)  technique  [Ref.  1]  was  developed  as  an  attempt  to 
enhanoft  throughput  over  die  Rrst-Come-Rrst-Served  (PCPS)  technique  for  dispatching 
tinrU*  frw  ntuntnnwiratinii  A  computcr  programmed  Simulator  bascd 

on  the  Department  of  the  Navy's  AN/UYS-2  Digital  Signal  Processing  System,  also 
known  as  die  p«***««^  Modular  Signal  Processor  (EMSP)  [Ref.  2],  was  developed  to 
evaluate  the  RC  tarhniqnaa  widi  reflect  to  nich  machines.  In  diis  thesis,  a  node  to 
processor  aHocadoo  component  has  been  added  to  die  simulator. 

A.  BACKGROUND 

Large  Grain  Data  Flow  (LGDF)  gnqihs  arc  particularly  suited  to  describing 
applications  where  large  amounts  of  data  are  generated  and  require  piedictable,  periodic 
processing.  Thus,  LODF  grqihs  are  often  used  to  model  signal  processing  iqiplkatimis 
with  qiecific  diroughpnt  requirements.  LODF  grqih  execudtm  can  be  carried  out  using  a 
balance  of  compUe-time  and  run>time  decisions  in  order  to  achieve  die  most  efScient 
throughput  Digital  signal  processing  (DSP)  qiplicadons  lend  themselves  easily  to 
cmnpile-time  analysis  because  DSP  qipUcadtms  are  very  qiecific  in  the  computation 
required  for  each  node  [Ref.  3].  The  ANAJYS-2  programs  use  large  grain  data  flow 
execution  as  dieir  nm-time  envircmment  and  thus  can  be  modeled  using  an  LGDF  gr^rii 
representadtm. 

For  an  LGDF  grqih  receiving  periodic  input  data,  PCFS  cannot  provide  unifonn 
dirouglqiut  under  high  loads  because  the  nodes  receiving  external  data  become  ready 
independent  of  other  nodes  in  the  grqdi  and  dins  die  nodes  higher  in  die  gnqih  become 
ready  before  die  lower  nodes  in  die  graph.  This  results  in  system  ctmgestitm  and  causes  a 
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decrease  in  throughput.  The  RC  technique  adds  gnq}h  (tependencies  to  the  nodes  in  the 
graph  thus  reducing  or  eliminating  diis  congesdcm  to  ensure  a  m(»e  uniform  throughput 

The  FCFS  scheduling  technique  places  nodes  into  the  system  based  on  when  the 
nodes  are  ready.  Thus  PCFS  cannot  benefit  fixMn  compile-dme  efforts  in  scheduling  nodes 
nor  does  it  bind  nodes  to  specific  inrocessois  for  execution.  In  previous  iq>plications  of  the 
RC  technique,  gnq)h  dependencies  were  added  at  compile-time  based  on  node  allocation 
that  was  performed  randomly.  Performance  with  diis  random  allocation,  however,  was 
poorer  than  that  provided  by  FCFS.  Thus,  in  order  to  ensure  that  RC  facilitates  better 
performance  than  FCFS,  it  is  necessary  to  modify  the  generation  of  graph  dependencies 
using  die  RC  technique  based  on  the  notfe  to  processor  aUocadon. 

B.  THESIS  SCOPE  AND  CONTRIBirnON 

This  thesis  describes  an  algoridim  for  allocation  of  nodes  to  processors  for  an  LGDF 
graph.  A  real  application  modeled  as  an  1X3DF  graph  is  studied,  based  on  a  signal 
correlator  graph  representing  an  actual  applicafioo  ninning  on  the  ANAJYS-2.  Results  are 
generated  using  the  node  allocation  program  as  well  as  previously  developed  software  and 
comparisons  made  between  the  Hrst-Come-First-Served  (FCFS)  technique  and  the 
Revolving  Cylinder  (RC)  technique. 

C.  THESIS  ORGANIZATION 

Chapter  n  describes  the  issues  involved  in  node  allocation  for  improving  the 
performance  of  the  LGDF.  Included  are  the  problems  existing  with  current  allocation 
methods  and  the  issues  addressed  as  a  result  of  these  deficiencies.  Chapter  m  gives  a 
description  of  the  algorithms  used  in  die  node  allocation  program  as  they  relate  to  die  issues 
in  Qiapter  n.  Quqiter  IV  presents  die  analysis  of  data  generated  from  several  scheduling 
mediods.  Chapter  V  summarizes  the  results,  presents  conclusioa«  drawn  from  die  data 
analysis,  and  provides  topics  of  furdier  study. 
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n.  ISSUES  IN  ALLOCATION  OF  NODES 


There  are  several  issues  relating  to  the  task  of  node  allocation.  In  the  model  discussed 
in  this  thesis,  nodes  are  assigned  to  processors  based  on  several  factors,  such  as  load 
balancing,  overlap  of  communication  and  computation,  and  contention  between  nodes  for 
memory  modules.  Each  of  diese  is  important  and  a  delicate  balance  between  diese  factors 
must  be  accmnplished  in  order  to  adiieve  maximum  utilization  of  die  processes. 

A.  PROBLEMS  WITH  CURRENT  ALLOCATION 

Node  allocation  in  die  general  sense  refers  to  the  binding  of  nodes  to  specific 
processors  for  execution  based  on  certain  critmia.  Allocation  is  separate  from  scheduling 
which  refers  to  determining  the  time  at  which  die  node  executes  on  die  processor  to  which 
it  is  allocated.  ^Kfidiout  proi^cr  node  allocation,  the  processors  cannot  execute  at  their  most 
efficient  level,  and  throughput  for  die  data  flow  graph  is  reduced  as  a  result  To 
demonstrate  this,  die  programs  in  ^pendix  A  and  [Ref.  4]  were  used  on  a  test  data  flow 
grqih,  illustrated  in  Hgure  2.1,  to  allocate  nodes  and  simulate  die  data  flow  graph. 

The  gtiqdi  shows  two  input/output  processors  and  13  nodes.  Even  numbered  nodes 
were  assumed  to  have  two  times  the  number  of  execution  cycles  as  odd  numbered  nodes. 
Each  individual  queue's  produce,  consume,  write,  and  read  amounts  were  considered 
equal;  however  these  values  differed  over  different  queues.  These  are  die  values  shown  on 
die  queues  in  figure  2.1.  The  queue  capadQr  was  equal  to  eight  times  die  queue  threshold. 
The  simulation  was  run  with  duee  processors  and  no  setup  or  breakdown  latency  for  the 
nodes  was  assumed.  In  addititm,  the  scheduler  latency  was  zero  and  die  communication 
time  for  one  word  was  five  cycles.  The  simulation  was  run  first  widiout  node  allocation, 
Le.,  the  nodes  are  assigned  to  processors  without  regard  for  satisfying  the  criteria 
described  above,  and  then  widi  proper  node  allocation.  In  die  first  case,  die  nodes  were 
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allocated  dynamically  at  nin-dme  based  on  vidiich  node  was  ready  and  which  processor  was 
free.  In  the  second  case,  the  nodes  were  allocated  statically  at  compile-time  based  on  load 
balancing,  queue  contention,  and  memory  module  contention.  The  results  are  compared  in 
the  graph  of  Figure  2.2.  Note  the  lower  utilization  rate  of  the  execution  unit  of  the 
processor  for  the  simulation  without  node  aUocation  as  well  as  die  lower  throughput 


Figure  2.1.  Test  Data  Flow  Graph 
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Figure  2.2.  Improvement  With  Allocation  Over  No  Allocation 
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B.  ISSUES  ADDRESSED 

1.  Load  Balancing 

In  order  to  ensure  the  processors  are  being  fully  utilized,  it  is  important  to  ensure 
that  the  nodes  executing  across  processors  are  balanced  with  respect  to  execution  and/or 
communication  times.  Since  die  emphasis  of  the  node  allocation  algcnithms  is  based  upon 
maximization  of  tiie  execution  unit  utilization,  load  balancing  for  the  processors  will  focus 
mainly  on  the  execution  time  of  the  nodes.  Load  balancing  is  achieved  by  statically 
assigning  nodes  to  processors  based  on  execution  times  of  the  nodes,  attempting  to 
nuuntain  the  same  number  of  execution  cycles  per  i^ocessor. 

2 .  Overlap 

Overlap  of  commuiucation  and  computation  is  important  to  the  LGDF  model  of 
computation.  The  system  contains  both  a  control  unit  and  an  executimi  unit  per  processor. 
It  is  desirable  to  utilize  both  of  these  units  in  a  way  that  permits  use  (tf  the  execution  unit  to 
the  fullest  extent  possible.  This  is  achieved  overltq}  of  cmrimunication  and  computation. 
There  are  two  conditions  which  must  be  met  for  nodes  to  overlap  sufficiently  such  that  the 
execution  unit  is  utilized  to  the  fullest  extent  possible.  Fot  two  nodes  j  and  j-t-l,  where 
node  j  executes  on  the  processor  before  node  j-t-l,  the  following  two  conditions  should 
exist: 

executionj  k  setupj+i.  (l) 

and 

breakdownj  ^  execudoiij^x  (2) 

Ideally,  perfect  overlap  of  communication  and  computation  is  desired,  such  as 
that  shown  in  Hgure  2.3. 
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Figure  2.3.  Ideal  Communication  /  Computation  Overlap  [Ref.  2] 
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Figure  2.4.  Typical  Communicati<m  /  Calculation  Overlap  [Ref.  2] 
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In  this  figure,  it  is  assumed  that  node  0  has  been  executing  for  some  time  before 
node  1  is  assigned.  Here  there  are  no  idle  or  blocked  cycles  on  the  processor,  since  nodes 
can  progress  immediately  from  input  (setup)  to  execution  to  output  (breakdown).  Note  that 
both  ctniditiai  (1)  and  condition  (2)  are  met  for  all  nodes  and  the  execution  unit  is  operating 
continuously.  This,  however,  is  not  always  the  case  in  reality,  as  shown  in  Figure  2.4. 

In  Figure  2.4,  there  is  contention  for  the  execution  unit  since  node  2  has 
conqileted  input  but  cannot  progress  to  the  execution  unit  because  node  1  is  still  executing. 
This  results  in  blocked  cycles  until  node  1  has  finished  executing.  In  addition,  idle  cycles 
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Figure  2,5  Poor  Communication  /  Computation  Overlap  [Ref.  2] 


may  also  exist,  such  as  above,  where  node  1  has  finished  breakdown  but  node  2  is  still 
executing  and  does  not  yet  require  the  control  unit  It  is  desirable  to  limit  the  blocked  and 
idle  times  to  maximize  overlap  wherever  possible.  In  addition,  a  situation  may  also  occur 
where  there  is  poor  overlap,  such  as  that  in  Figure  2.5. 


In  this  figure  node  2  has  not  completed  setup  after  node  1  finishes  execution,  and 
node  1  must  therefore  wait  for  access  to  the  control  unit,  creating  idle  cycles  on  the 
execution  unit  In  addition,  node  I's  breakdown  is  longer  than  node  2's  execution.  This 
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results  in  additional  idle  cycles,  since  node  2  must  Ineakdown  and  node  3  must  setup 
before  the  execution  unit  is  utilized  again. 

3 .  Memory  Contention 

The  menxxy  modules  are  the  representation  for  the  system  memory  [REF  1]. 
Each  processor  must  address  memcMy  modules  to  transfer  data  to  or  from  a  node  during  a 
read  or  write  operation,  respectively.  Each  queue  in  the  data  flow  gnq>h  is  assigned  a 
memory  module  either  by  the  user  or  arbitrarily  1^  the  scheduler.  Only  one  processor  can 
access  a  given  memory  module  at  any  one  time.  It  is  possible  however,  for  a  processor  to 
be  accessing  a  memory  module  (either  reading  or  writing)  while  another  processor  is 
attempting  to  utilize  the  same  memcny  module.  This  is  memory  contention.  Thus  the 
processor  which  is  attetrqrting  to  access  the  memory  noodule  must  wait  until  the  memory 
module  is  free.  This  delays  the  cotrqrletion  of  the  graph  and  affects  throughput  Memory 
contendcm  can  be  reduced  or  avoided  by  ensuring  that  sufficient  memory  noodules  exist  to 
frilfrU  the  requirements  of  all  the  queues  of  die  graph.  Altemadvdy,  queues  can  be  mapped 
oa  die  availaUe  memory  modules  sudi  that  dus  oontentkMi  is  minimized. 

C.  WRAP-AROUND 

Wrap-around  is  a  technique  used  to  maximize  the  overly)  as  permitted  by  the  RC 
iqiproach  by  statically  'wnqiping'  die  breakdown  time  of  the  last  node  to  die  idle  or  blocked 
dme  at  the  head  of  a  cylinder.  An  example  wili  better  illustrate  diisprincqile.  Hgure2Jis 
a  static  representation  of  a  cylinder  with  duee  nodes  (m  a  single  processor. 


9 


Control  Unit 

Execution  Unit 

setupj 

idle 

setup  j+1 

execute  j 

idle 

w 

breakdown  j 

execute 

r 

a 

P 

setup 

breakdown 

execute  j^2 

idle 

— 

idle 

1 _ 

_ 1 

Figure  2^.  Cylinder  Without  Wrap*Around 


Both  ccmtrol  unit  and  execution  unit  are  shown.  Note  that  because  node  j-fl's  setup 
time  is  shorter  than  node  j's  execution  time,  blocked  cycles  result  on  the  control  unit  If 
node  j-t-Ts  breakdown  time  is  sufficiendy  short  such  that  node  j+2’s  breakdown  time  can 
be  placed  in  the  blocked  cycle  time,  the  cylinder  length  is  reduced  and  the  number  of 
blocked  cycles  are  reduced,  increasing  control  unit  utilization.  The  resultant  cylinder  is 
shown  in  Figure  2.6. 
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Note  that  the  iteration  index  of  node  j-t'2's  breakdown  has  changed  to  indicate  that  the 
breakdown  is  now  from  a  previous  iteration.  The  goal  of  wrap-around  is  to  atten^t  to 
shorten  the  length  of  the  cylinder  by  an  amount  equal  to  the  length  of  the  breakdown  dme 
of  the  last  node  without  extending  the  length  of  the  executicxi  unit 


Control  Unit  Execution  Unit 


setupj 

idle 

*“Pj+i 

breakdownj^2  (-1) 

executej 

breakdown  j 

executej+j 

setup  j+2 

breakdown  j+j 

executej+2 

idle 

Figure  2.6.  Cylinder  With  Wrap«Around 

In  general,  for  one  or  two  nodes  j  (and  j-fl)  executing  on  a  processor  where  node  j 
executes  before  node  wrap-around  is  possible  if: 

setupj^i  +  breakdowxij  +  breakdowxij^i  ^  execudonj  +  executionj^i 

as  long  as  at  least  condition  (1)  is  satisfied. 

For  three  or  nxne  nodes  <xi  a  processor,  the  general  case  becomes  naore  coirq>licated, 
because  there  is  a  potential  for  the  third  node's  setup  time  to  occur  during  the  second 
node's  execution  time.  In  this  case,  wrap-around  is  dependent  on  which  coiidition(s)  listed 
above  is  (ate)  satisfied. 

Let  there  exist  nodes  j,  j-f  1,  j+2,  and  j4'N,  where  j  is  the  first  node,  j+1  is  the  sectmd 
node,  j+2  is  the  third  node,  and  j+N  is  the  last  node  on  a  processor  with  N  nodes.  For 
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exactly  three  nodes  on  a  processor,  j-f  N  and  j+2  are  synonymous.  There  are  three  cases 
for  wrap-around: 

Case  1:  only  condition  (1)  is  satisfied.  Figure  2.7  illustrates  this  case.  Here,  since 
node  j’s  breakdown  is  greater  than  node  j+l's  execution  time,  node  j+2's  setup  time  cannot 

be  overlapped  with  node  j-( ' 's  execution  time.  Thus,  wrap-around  is  possible  if: 

setupj+i  +  breakdownj^),^  ^  executionj 


Control  Unit 

Execution  Unit 

setupj 

idle 

setupj+i 

executej 

breakdownj^^  (-1) 

breakdownj 

execute 

breakdown 

idle 

setupj+2 

• 

• 

• 

• 

• 

• 

1 

_ 

Figure  2.7.  Wrap-Around  (Case  1) 

(^ase2:  only  condition  (2)  is  satisfied.  Figure  2.8  illustrates  this.  Fmr  this  condition 
wrap-around  cannot  occur  at  all,  ance  doing  so  would  extend  the  length  of  the  executitm 
unit. 
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Case  3;  both  condition  (1)  and  (2)  exist  This  case  is  shown  in  Figure  2.9.  For  diis 
case  wrap-around  is  possible  if: 

setupj^i  +  breakdown]  +  setupj^2  +  breakdownj+N  ^  execution]  +  executionj^i 
Note  that  if  neither  condition  (1)  nor  condition  (2)  ^lies,  wn^around  is  not  possible. 


CoDtrolUnit 

Execution  Unit 

setupj 

idle 

setup  j^.1 

breakdown  (-1) 

execute  j 

breakdownj 

executej^l 

setupj+2 

lneakdownj^j[ 

execute  j^2 

idle 

• 

• 

• 

• 

• 

• 

1 

_ 

Figure  2.9.  Wrap-Aromid  (Case  3) 
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UL  ALGORITHM  FOR  NODE  ALLOCATION 


This  ch^)ter  discusses  the  particular  node  aUocaticm  algoiitlim  diat  addresses  the  issues 
discussed  in  die  previous  chifjler  widiin  the  caooq;>t  of  the  LGDF  model.  Initial  aUocatkm 
of  die  nodes  to  pfOcess(»s  is  accomplished  by  the  user,  taking  into  account  proper  load 

halancing  Thf  iymflinin£  issues  awe  handled  hy  the  slgoridmis  discussed  below. 

A.  OVERLAP 

Overlap  is  accomplished  by  first  taking  each  |xooessor  individually  and  scheduling  die 
node  with  the  greatest  execution  time  first  This  algorithm  is  illustrated  in  Rgure  3.1.  The 
nodes  are  dien  on  each  processn’ with  regard  to  overlif)  as  in  Rgure  3.2. 


for  i  =  1  to  toiaLhtiin^-Of 

for  processor  Pi 

vdiilenodej  I^NULL 

ifexecutiooj  <Bexecotiooj.|.l 
tenqiBix^  j 
nodej  snodeyfl 
j«j+l 

endif 
endv^iile 
end  for 
end  for 


Flfnre  3.1.  Execntion  Cycle  Sdiediiliiig  Algmritliin 
fo  die  overii^  algocidim,  the  second  node  on  die  processor  is  initially  compared  to  die 
firstnode.  If  die  setup  of  the  second  node  is  leas  dian  die  executitm  time  of  die  first  node, 
dien  oveilq)  can  occur  and  die  second  node  is  scheduled  afier  the  first  node. 


IS 


for  i  =  1  to  total_niimber_of  j)rooessors 
for  processor  Pj 

j  =  l 

k=j+l 

schedule 

while  node  j  !s  NULL 

while  execudoo  j  <  setups 
ksk-i>l 
end^f^iile 
temps  node  j+1 
node  j+1  s  node  k 
nodek  stemp 

j=i+l 
end  while 
end  for 
end  for 


Ficore  3^  Overlap  Algorithm 

If  diis  condition  is  not  true,  the  following  node  on  the  jvocessor  is  then  compared  to 
the  first  node  and  this  process  continues  until  a  suitable  node  is  found  or  until  aU  nodes  on 
the  processor  have  been  checked.  If  all  nodes  on  die  processor  have  been  checked  and 
none  are  found  suitable,  or  if  a  node  has  been  found  which  meets  die  conditions,  the  node 
is  scheduled  and  this  node  is  then  compared  to  the  remaining  nodes.  This  process 
continues  until  all  nodes  on  die  processor  have  been  nhausted.  This  scheduling  mediod  is 
performed  on  each  processor  in  turn.  It  is  assumed  that  since  die  nodes  are  initially 
scheduled  in  decreasing  tnder  of  execution  that  die  Ineakdown  of  the  previous  node  will 
Ukely  be  less  dum  the  executioo  time  of  the  next  node. 
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B.  WRAP-AROUND 


The  wrap-around  algorithm  is  shown  in  Figure  3.3.  For  each  processor,  the 
breakdown  time  of  the  last  node  is  taken  and  summed  widi  the  setup  time  of  the  second 
node  and  tl^  breakdown  time  of  the  first  node.  This  sum  is  compared  to  the  sum  of  the 
execution  times  of  the  first  and  second  nodes.  If  the  sum  of  die  setup  and  breakdown  times 
is  less  dian  die  sum  of  die  execution  times,  die  last  node  breakdown  time  can  be  wrapped- 
around.  There  are  several  other  conditions  udiich  can  also  occur.  Topically,  for  more  than 
three  nodes  scheduled  on  a  processor,  it  is  possible  for  the  setup  time  of  the  third  node  to 


for  i  =  1  to  totaLnumber.of  .processors 
for  processor  P  j 

j  =  l 

if  breakdown]  •fsetupj^j-t- breakdown  csexecutionj  +  execution 

start  breakdown  e  (setup  j  breakdown  j -t*  setupj^.^ )  cycles 

end  if 
end  for 

end  for 


Figure  33.  Wrap-Around  Algoridun 

occur  during  the  execution  time  of  the  second  node  after  the  breakdown  of  die  first  node. 
In  this  case,  die  setup  time  of  the  third  node  is  also  summed  with  the  setiqi  time  of  die 
second  node  and  die  breakdown  times  of  the  first  node  and  die  last  node. 
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IV.  RUN-TIME  PERFORMANCE 


This  chapter  describes  the  results  for  use  of  the  revolving  cylinder  algorithm.  The 
jKOgrams  used  for  generation  of  the  results  are  folly  described  in  i^^ndix  B.  Rgure  4.1 
is  a  graphical  summary  of  the  programs  and  their  related  inputs  and  ouqnits. 

A.  PERFORMANCE  METRICS 

The  performance  evaluaticms  for  the  RC  technique  were  generated  using  an  actual 
^licatitMi  graph  called  a  correlate  [Ref.  4].  This  graph  is  illustrated  in  Rgure  4.1.  The 
RC  technique  that  was  analyzed  was  die  start  after  finish  (SAF)  technique.  The  results 
from  this  technique  were  compared  to  an  PCFS  scheduling  algraidim.  Simulations  were 
performed  on  cylinders  generated  for  bodi  wnq>-atound  and  non  wnq>>aronnd  techniques. 

Several  initial  assumptions  were  made  frar  the  RC  cylinders.  The  scheduler  latency, 
node  setup  and  breakdown  latency,  and  instruction  size  were  assumed  to  be  zero.  The 
read,  write,  produce,  consume,  and  direshold  amounts  for  an  individual  queue  were 
assumed  to  be  equal  The  queue  oqiacity  was  calculated  as  ei^  times  die  queue  threshold. 
Nodes  were  manually  allocated  to  processors  based  on  load  balancing  and  minimiring 
queue  contention;  diat  is,  no  processor  would  simultaneously  access  die  same  queue  for 
reading  and  writing.  As  many  memory  modules  as  necessary  to  completely  eliminate 
memory  module  conterdion  were  dien  assigned  to  processors.  The  number  of  memory 
modules  required  was  based  on  die  static  rqnesentatirai  of  die  cylinder  generated  die 
scheduler  and  miqiping  programs.  Eight  inocessors  were  used  in  the  ^stem  and  die  iKide 
to  processor  aUocatkm  was  identical  diroqghoat  die  simulttions. 
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Figure  4.1.  Revolving  Cylinder  Program  Summary 
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ngure  4JL  Correlator  Graidi  [Ref.  5] 
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B.  RESULTS 


Figures  4.3  and  4.4  illustrate  the  normalized  maximum  throughput  for  the  correlator 
versus  the  ratio  of  communication  cycles  to  confutation  cycles.  The  commurucation  costs 
used  for  the  mapping  were  varied  from  3  to  23  cycles  to  transfer  one  word  of  data  from  a 
processor  to  memory.  These  correspond  to  communication/computation  ratios  of  0.1  to 
0.77,  respectively.  The  theoretical  minimum  average  input  period  was  used  as  the 


Ratio  of  Communication  Cycles  to  Computation  Cycles 


Figure  4.3.  Normalized  Maximum  Throughput  vs. 
Communication/Computation  (No  Wrap) 


normalizing  factor.  This  normalizing  factor  was  calculated  by  taking  the  inverse  of  the 
ideal  cylinder  calculation  frx'  one  instance  of  the  graph  and  multiplying  by  1x10^.  The 
1x10^  factor  is  necessary  since  maximum  throughput  is  given  by  the  simulator  in  instances 
per  megacycle. 
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The  'mappeT*  points  listed  in  the  legend  represent  the  maximum  theoretical  throughput 
for  the  compile-dme  representation  of  the  cylinder.  This  value  is  obtained  by  taking  the 
inverse  of  the  end  time  of  activities  obtained  by  the  map  program  multiplied  by  1x10^.  In 
Figure  4.4,  there  are  two  representations  of  the  mapper.  The  first  is  a  'flat'  cylinder.  Each 


static  cylinder  slice  of  the  grqih  ends  at  different  time,  represented  as  a  number  of  cycles. 
The  'flaf  cylindn  takes  the  greatest  end  time  of  all  cylinder  slices  and  uses  that  value  as  the 
average  end  time  of  the  graph.  This  means  if  a  cylinder  slice  ends  before  this  average  end 
time,  idle  cycles  may  be  added  to  the  execution  unit,  thereby  decreasing  the  calculated 
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throughput.  The  'jagged*  cylindCT,  however,  takes  into  account  each  individual  cylinder 
slice  end  time,  and  uses  the  average  of  the  end  times  over  all  cylinder  slices  as  the  average 
end  time.  Thus,  maximum  throughput  for  the  'jagged'  cylinder  is  greater  than  the  'flat' 
cylinder. 

In  both  Figure  4.3  and  Figure  4.4,  note  that  as  communication  costs  increase,  SAF 
results  in  better  throughput  than  FCFS.  This  is  due  to  the  ability  in  SAF  to  map  the  nodes 
to  minimize  contention  [Ref.  2]. 

Since  the  node  to  processor  allocation  was  identical  throughout  the  simulations,  it  was 
desirable  to  see  if  different  allocation  at  various  communication  costs  would  have  an  effect 
on  throughput  A  separate  node  to  processor  allocation  was  tried  for  IS  and  20  cycles  to 
transfer  one  word  of  data  from  a  process^'  to  memory.  The  allocation  of  nodes  was 
modified  only  slightly,  i.e.,  only  one  node  was  allocated  to  a  different  processor.  These 
points  are  indicated  in  Figures  4.3  and  4.4  as  'New  Ma^'.  It  is  clear  for  both  wrap  and  no 
wrap  cases  that  the  revolving  cylinder  values  (SAF  and  mapper)  are  affected  by  slight 
changes  in  the  node  allocation. 

Hgures  4.5  and  4.6  represent  the  normalized  response  time  and  the  coefficient  of 
variation  of  normalized  re^xmse  time  for  botii  the  no  wnq)  and  wrap  cases,  respectively. 
The  normalizing  factor  used  in  Hgure  4.S  is  the  number  of  execution  cycles  required  for 
the  conq>letion  of  one  iteration  of  the  critical  path  of  the  graph.  Note  that  die  ies>onse  time 
for  SAF  (both  no  wn^  and  wnqi  cases)  is  lower  than  FCFS  at  high  communicatitm  costs. 
Note  also  that  although  SAF  no  wrap  has  a  slightly  better  resxMise  time  than  with  wn^  at 
high  communication  costs,  modifying  the  node  to  processor  allocatimi  (New  Map)  has  a 
significant  affect  on  the  no  wrap  case.  Thus,  it  is  possible  to  improve  the  response  times 
for  both  cases  by  changing  die  node  allocation. 

The  coefficient  of  variation  represented  in  Figure  4.6  is  a  measured  comparison 
between  the  response  times  of  all  grqih  instances  to  the  average  re^nse  time.  The  lower 
this  number,  the  closer  the  measured  response  times  are  to  die  average  [Ref.  2].  SAF  with 
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wrap  appears  to  have  the  best  overall  perfomiance  as  measured  by  coefficient  of  variation 
throughout  the  range  of  communication  costs.  Again,  however,  modification  of  the  node 
to  processor  allocation  significantly  affects  the  results,  thus  indicating  that  coefficient  of 
variation  could  be  inqiroved  over  FCFS  for  both  S  AF  cases. 


0.1  0.17  0.33  0.43  0.5  0.53  0.57  0.6  0.63  0.67  0.7  0.77 
Ratio  of  Communication  Cycles  to  Coixq[>otation  Cycles 


Figure  4.5.  Normalized  Response  Time  vs.  Communication/Computation 
Figures  4.7,  4.8  and  4.9  represent  normalized  maximum  throughput  for 
communication  costs  of  3  cycles,  5  cycles,  and  15  cycles  versus  load.  Load  in  this  case  is 
based  on  fractional  multiples  of  the  maximum  throughput  case  (1.0  in  the  figure).  These 
multiples  cotrespond  to  a  range  of  gn^h  input  from  severe  lack  of  input  data  to  overflow  of 
data.  From  these  figures,  SAF  results  in  slightly  better  overaU  throughput  at  higher  graph 
loads  versus  FCFS.  Although  SAF  no  wrap  performs  better  than  SAF  with  wnq>  at  low 
communication  costs,  SAF  with  wrap  achieves  a  higher  overall  throughput  over  SAF  no 
wrap  and  FCFS  at  high  communication  costs  for  the  entire  range  of  loads,  which  is  the 
desired  result. 
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Coefficient  of  Variation  for 
Graph  Response  Time 


Simulated  Maximum  Throughput/ 
Theoretical  Maximum  Throughput 


Figure  4.9.  Normalized  Maximum  Throughput  vs.  Load  (15  Cycles/Word) 

(0.50  Communication/Computation) 
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Figures  4.10,  through  4.15  illustrate  nonnalized  response  time  versus  load  and 
coefficient  of  variation  versus  load  for  the  same  commuiucation  costs  as  the  three  previous 
figures.  From  these  figures,  SAF  is  shown  to  have  the  best  overall  response  time  and 
lowest  coefficient  of  variation  throughout  the  range  of  load. 


Figure  4.10.  Normalized  Response  Time  vs.  Load  (3  Cycles/Word) 
(0.10  Communication/Computation) 
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Figure  4.11.  Normalized  Response  Time  vs.  Load  (5  Cycles/Word) 
(0.17  Communication/Computation) 


Figure  4.12.  Normalized  Response  Time  vs.  Load  (15  cycles/Word) 
(0.5  Communication/Computation) 
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Coefficient  of  Variation  f 
Onph  Response  Time 


Fifure  4.14.  Coefficient  of  Variation  vs.  Load  (5  Cycles/Word) 
(0.17  Communication/Computation) 
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Coefficient  of  Variation  for 
Grq>h  Response  Time 


Figure  4.15.  Coefficient  of  Variation  vs.  Load  (IS  Cycles/Word) 
(0.5  Communication/Computation) 
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V.  CONCLUSION 


This  thesis  ctescribed  the  issues  involved  in  node  allocation  and  described  a  jHOgrana 
implemented  to  resolve  those  issues.  An  addition  to  die  RC  technique,  wn^around  was 
also  analyzed  as  an  inqxovement  to  the  compile*time  isqilementation  of  the  gnqih. 

A  revolving  cylinder  technique,  stan-after-finish,  was  studied  and  conqiazed  to  the 
First-Come-First-Served  technique  for  a  large  grain  data  flow  graph  modeL  It  was 
demonstrated  that  RC  provides  overall  better  throu^ut  than  FCFS,  particulaily  at  hi^ 
communication  costs.  In  addition,  it  was  shown  that  die  RC  technique  is  sensitive  to 
cylinder  nuqiping,  especially  at  high  communication  costs.  Thus,  it  is  important  in  die 
analysis  of  the  RC  technique  to  optimize  the  mapping  for  each  instance  of  communication 
cost 

A.  FURTHER  RESEARCH 

There  were  several  initial  assunqitions  that  were  made  for  the  graph  model  duu  could 
be  removed  for  future  wc«k. 

1.  The  number  of  instructions  for  each  node  was  assumed  to  be  zero.  Analysis 
should  be  conducted  widi  variable  instruction  loigths. 

2.  Scheduler  latency  was  also  assumed  to  be  zero.  This  quantity  should  also  be 
varied  and  its  effea  <m  the  RC  technique  studied. 

3.  Since  the  RC  results  were  soisitive  to  cylinder  m^iping,  it  would  be  desiraUe  to 
find  an  optimum  cylinder  nuqiping  for  each  level  of  communication  cost  From  diis  a 
heuristic  could  be  developed  such  that  an  extra  program  module  could  be  added  to  the 
existing  programs  to  perform  diis  task  autcmiatk:aily. 
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APPENDK  A.  NODE  ALLOCATION  PROGRAM 


//  LIEUTENANT  JOHN  P.  CARDANY,  U.S.  NAVY 
//20  APRIL  1994 

//  NAVAL  POSTGRADUATE  SCHOOL 

//  ADVISORS:  PROFESSORS  SHRIDHAR  SHUKLA  AND  AMR  ZAKY 

// Laiige  Grain  Data  Flow  Node  to  Processor  Schedule  Program 
//  schedule.C 


4Knclude  <iostream  Ji> 
#include  <fstrBam.h> 
#mclude  <iomanip.h> 
iMndude  <stdlibJl^ 


#include  "node_alloc  Ji" 

node.alloc  cylinder;  //define  a  cylinder  as  a  node  aUoc  type 
int 

mainO 

( 

cout « "VnVnLARGE  GRAIN  DATA  FLOW  NODE  TO  PROCESSOR  SCHEDULING 
PROGRAM\n\n"; 

cout «  "VnALLOCAUNG  NODES...\n"; 

//System  calls 
cylinder.defiiie_tiniesO; 
cylinder jead_ptoces8or_fik0; 
cylinder  jead.queueJBleQ; 
cylinder.change_node_fi]i^; 
cylinder.otder_podesO; 
cyHnder^uence  nodesQ; 

cout « "\nEND  OF  PROGRAMNn"; 
return  0; 

} 
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//  LIEUTENANT  JOHN  CARDANY,  U.S,  NAVY 
// 20  APRIL  1994 

//  NAVAL  POSTGRADUATE  SCHOOL 

//  ADVISORS;  PROFESSORS  SHRIDHAR  SHUKLA  AND  AMR  ZAKY 

//  Node  AllocatKHi  Class  Header  Hie 
//no<k.allocJi 


#ifedefNQDEJVIJLOCJI 
#define  NODE_ALLOC JI 

#inclttde  <fstieam  Ji> 
#iiicliide  <iostitam  Ji> 
#defiiie  newln  Vi' 

class  node.alloc 

{ 

private: 

//  Stnictuie  to  define  queues 
struct  queue.type 

intdueue  id: 
intsouioejiode; 
int  sink;^ode; 
long  writejunount; 
long  read  amount; 

); 


//  Structure  to  define  nodes 
struct  node.type 

int  nodes.4)er_piocessOT; 

intnodejd; 

long  instr_size; 

loogsetiqr.time; 

long  exe.tiine; 

long  bieakdown^time; 

intproc.type; 

Itmg  start^time; 
kmg  end.time; 


//  Structure  to  ddine  ORDERJN  elements 
struct  order JrtJype 

intnodejd; 
long  start^time; 


//  User  inputs  to  define  system 
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int  number_of_nodcs; 
int  number_of_queues; 
int  numbcr_of_processors; 
long  latency; 
long  fixed_sctup; 
int  comm; 

long  setup,  breakdown; 
queue_type  queue[  250  ]; 
node_type  node[250]; 
node_type  order[50][50]; 
nodc_t^  new_ordcr[5()][50]; 
order_m_type  CHU)ER[250]; 

public: 

//  Class  Constructor 
node.allocO; 

//  Function  to  load  timing  information  into  the  ^stem 
void  define.timesO: 

//  Function  to  read  number  of  processors  into  tiie  ^stem 
void  read_prooessor_fi]eO: 

//  Fimction  to  load  queue  data  into  the  system 
void  readLqueue_fiteO; 

//  Function  to  load  die  node  dat^  into  die  system 
void  change_nodejG]bO; 

// Function  to  Order  die  Nodes  and  Create  die  (XIDERJN  Hie 
void  order_nodesO; 

//  Function  to  calculate  the  unused  execution  c^les 
void  cak.unusedLexejcyclesO; 

//  Functicm  to  implement  wttqi-atound 
void  wnq)_aroundO; 

//Function  to  Create  cylinder  file 
void  make_cy]inder_file(lon^; 

//Function  to  print  processor  statistics 
void  generate_processor_stats(int4ongjong4ongj[ong4ongjong,  long); 

//Function  to  print  t^linder  statistics 
voidgenerate_statisticsGong4ongJong4ongJong4ong4ong4ong); 


//  Number  of  Nodes  in  die  System 
//  Number  of  Queues  in  die  System 
//  Number  of  Processors  in  the  System 
//  Fixed  Scheduler  L^ncy 
//  Hxed  Setup  time  (a) 

//  Communication  lime  for  One  Word  of  Infcamation 

//Setup  and  Breakdown  Times  Per  Node 

//  System  Queues 

//  System  Nodes 

IfMaixa.  to  store  Node  structures 

//Matrix  to  store  ordered  Node  Structures 

//Node  order  matrix 
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//  Function  to  retwder  the  ORDERJN  file  sequentially 
void  sequence_nodesO; 

//Class  Destructor 
-nodc_allocO  {} 

); 

#endif 
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//  LIEUTENANT  JOHN  P.  CARDANY,  U.S  AVY 
//  28  April  1994 

//  NAVAL  POSTGRADUATE  SCHOOL 

//ADWISORS:  PROFESSORS  SHRIDHAR  SHUKLA  AND  AMR  ZAKY 

//  Input-Ou^t  I^ua  Class  Source  Hie 
//  ii<^_alloc.C 


#include  "node_allocJi" 

delude  <iostream  Ji> 
#iiiclude  <ioniaiiip.h> 
IKnclude  <std]ib  Ji> 


//  Class  CcMistnictor 
iiode_alloc::iiode_allocO 
{ 

number_of_nodes  s  0; 
number.ofjqueues  =  0; 
number.of_processots  =  0; 
comnisO; 

} 


//  Function  to  Load  Timing  Momation  into  the  System 
void 

node_alloc::define.timesO 

{ 

cout « "VnFixed  Setiq)  Time  (cycles)  :  *; 
cm  »  fixed_setiq); 

coot «  "VnWord  Communications  Time  (cycles)  : 
cin»comm; 

if  ( fixed_setop  <  0  ii  comm  <  0 ) 

{ 

cetr  « "VnlnvaUd  Communicalion  HmeVn”; 
exit(0); 

} 


// Function  to  Read  the  ProcesscM' Hie 
void 

node_alloc:3ead_jirooessor_^le0 

{ 

ifstream  prooessorJnpo(_file; 
ptooessor_inpuL.file.qpen("I^OCS.IN"); 
if  ( !processorJnput_^ ) 

{ 

cerr  « "VnCannot  Opmi  file  PROCSTN\n”; 
exit(0); 

} 

piooessor_inpuUye  »  number_of_processois; 
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cout «  "NnNumber  of  IRrooessors: " « iiumber_of_pfOcess(»s  « "\n\n" 
prooessor_iiq>uUile.closeO; 

} 

//Function  to  Load  die  Queue  Data  Hie 
void 

node_aUoc;aead_queue_^0 

{ 

aueueJniniL^.^pen(  "QUEUES  JN" ); 

:  ( Iqueue  ) 

cerr « "\nCannot  open  file  QUEUES  JNVn"; 
eidtCO); 

} 

queue JiqiuL^  »  number_of_queues; 

for  ( int  cnt  s  0;  cnt  <  number.ofjqueues;  cnt-H- ) 

quoie^puLjfik  »  queue(  cut  ].qoeue_id; 
queue Ji^tJB3e  »  queue[  cnt  ].souioe.jiode; 
queuejnpui;.^  »  queue[  cnt  ].sin]^jnode; 
queue^puLlde  »  qiiene[  cnt  ].write_amoiint; 

^ueJiqiuLfilc  »  queoe[  cnt  ]  jeadjamount; 
u  (  queue[  cnt  ].queue_id  <s0) 

cerr  « "Vn&ivalid  Queue  ID  or  Wnng  Quand^”; 
eidtfO); 

if  ( queue[  cnt  J.wiitejBmount  <  0 II  qoeiieC  ait  ]  jead_amount  <  0 ) 

oeir  « "VnlDvalid  Parameter  for  Queue :  *  «  setw(fi); 

cor  «  queueC  cnt  ].queue_td  «  entU; 

exit(0); 

} 

for  ( int  cntq  s  0;  cntq  <  cut;  cnlq++ ) 

{ 

if  (  qoeue[  cntq  ]^ueoeJd  sss  queue[  cnt  ].qoeueJd ) 

cerr«  "VnDqilicated  Queue  ID : "  «8etw(Q; 
cerr  «  queue[  cnt  ]i)oeoeJd  «  endl; 
od^O); 

} 

} 

} 

queue Jiqnit_file.doseO; 


// Function  to  Change  die  Node  Data  Rk 
void 

node_tdloc::chaitge_jiode_^leO 

{ 

i&titam  node Jiq>otJ51e; 
nodeJnpaL;fik.(^ien(  "N(H>ES.IN" ); 
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if  ( Inode  Jnpucfile ) 

oerr  « "\nCannot  open  file  NODES.IN\n"; 
exit(0); 

} 

o&tream  node.ou^uLfile; 
node_ou^iiCfile-open("TEMP.OUT"); 
int  n Jd,  Lsize,  p.type; 

unsigi^  long  s.tinie,  e.time,  bjime;  //sssetup,  bs^neakdown,  e=execiiti(m 

nodeJnpuU^  »  number.oCnodes; 

node_ou^ut_file  «  number.of jx)des  «  newin  «  newln; 

for  ( int  cot  =  0;  cnt  <  number_of_nodes;  cat++ ) 

{ 

node.iiq)utjGle  »  ojd  »  Loa®  »  s_time  »  ejtime  »  bjdme  »  p_QT>c; 

if(njd<=0) 

{ 

cetr  « "\nlnvalid  Node  ID  or  Wn»g  Onanfio^"; 
cxit(0); 

} 

if  ( s.time  <  0  H  e.tiine  <  0 II  b.tiine  <01  Lsu®  <  0 ) 

{ 

cetr«  "\nlnvalid  Parameter  for  Node : "  «setw(Q  «pjd«cndl; 
exit(0); 

} 

long  setup  sO; 
long  breakdown  s  0; 

for  ( int  cntq  ~  0;  cniq  <  number.of .queues;  oitq-H- ) 
if  ( queue[  cntq  ].souioejnode  —  njd  ) 

Ineakdown  (  comm  *  qoeue[  cntq  ].wiite.amoont ); 

if  (  queue[  cntq  ].sin]uDode  SB  nJd  ) 
setiq)  +s  ( cmnm  *  queuef  cntq  ]  jead.amount ); 

} 

} 

setiq)  -fe  fixedjsetup; 
bresitdown  4s  fixedLsetiq>; 
s.time  s  setiq); 
b.time  s  bcealalown; 

node.outpuLfile  «  ujd  «  setw(4) «  Lsize  «  setw(8)  «  s.time  «  setw(12) « 
e.time  «  aetw(13)«  b.time  «  setw(14)  «  p.type  «  newln; 

} 

nodeJnpuljBlexloseO; 
node  outDutfile.clo8eO; 
systoScrSmUESm  NODES JN.CX^G”); 
systemCmv  TEMP.OUT  NODES  JN”); 

} 
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//  Function  to  Schedule  the  Nodes  and  create  dte  C^DERJN  file 
vmd 

node_alloc::ofder_nodesO 

{ 

int  loop,  count  tail Jndex,  swu>_index; 
int  SWAPP"^  rjidOVE; 
node.type  4_ptr,  •Ciirr_ptr,  •Tail_ptr; 
node.ty^  T)b,MP_NODE; 

ifistream  nodejnputjffle; 
nodeJnpuUB]e.(q)enCNOD^JN”); 

(^stream  Ofder.ootpat_fi]e: 
oidar_oulpuL^<(Ven("(XlDERJN"); 
node JnpoL.file  »  number.oLiiodes; 

fiM'(mtcntsO;cnt<number.oOiodes;cnM-f)  //Place  nodes  in  an  am^ 

{ 

node[cnt].nodes_per_prooessor  s  0; 
nodeJt^NiLfile  »  node[cnt]  Jtodejd; 
node Jiq>nt.^fi]e  »  ttode[ait]  JnstrjsiK; 
node_ii^ut_fik  »  nodefcnt]  jetiq)_time; 
node Ji^t-file  »  node[cnt]x3te_ttme; 
nodeJi^Lfile  »  node[cnt].t»eakdowi]Ltiine; 
node »  node[cnt].pcDC_^ype; 
node[cnt].stai^tiine  »  0; 


} 

for  (int  i  s  0;  i  <  nnmber_of_prooess(»s;  i-H- )  //Place  die  nodes  in  two  2x2  matrices 

count  sO; 

for  (int  j  =  0;  j  <  nuinberjofjnodes;}4-i' ) 

if  (  node(j].procjype  « i^-l  ) 

(Mxler(count][i]  s  nodeOl; 
order[count][i]  JK>des._pCT_proo6S8or  s  counts  1; 
new.(nder[oount][i]  s  nodelj]; 
iiew_order[count]p]  jiodesjper_prDoessor = count-f  1; 
count4-t-; 

oider[coant]G]jiodeJd  sNULL; 
new_onler[ccNint]0]  Jiodejd  s  NULL; 

}  ^ 

} 

// Order  Nodes  in  decreasing  Exe  time 


for  (int  j  sO;  j  <  nnmber_.ofj|m)oess<X8;  j4^  ) 


//ttien  swap  nodes 


int  node_index  s  0; 

SWAPPED  =1; 
while  (SWAPPED) 

{ 

SWAPPED  *0; 

for  ( i  s  0;  i  <  number_of_nodes;  i++  ) 

{ 

if  ( oider(i][j].exe_tiine  <  onier[i*f  l]|j].exe_tiiDe ) 

TEMP_NODE  =  odeirilOl; 
ord«r(iT(j]  s  (Hxler[i+l](j]; 
order(i+l)(j]  =  TEMP.NODE; 

SWAIV^  s  1;  //and  set  a  flag 

) 

} 

} 

} 


// Order  nodes  by  comparmg  Exe  and  Setiq)  times 

for  (j  =  0;  j  <  nttmber_.of.jMt)oess(»s;>H- ) 

intnode.indexsO; 

T_MOVE=0; 

HeadjMr  s  &ordei(nodeJndex]|j]; 

Tuljplr  s  &cffdei{nodeJndex<fl][g; 

Cunipcr  s  TaiLpcr; 


if  (  Tail_ptr->node  Jd  »  NULL  )  //  tmly  one  node 
(xder.outpot.^  «  Head_ptr'>aode Jd  «  setw(8) «  Head_ptr->stan;_tinie 

«  newln; 

} 

iHdiile  (  TaU_ptr->nodeJUl  !s  NULL  )  //dieckallnodesonapcooessor 

SWAPPED  sO; 

T_MOVE*0; 

swapjndex  =  nodejndex; 

iRdiik(HeadLj>tr->exejtinie<TaiLptr'’>seti9Ltime)  //keq)  swqniqg  nodes  onid 
{  //cooditioDnils 

taiUndex  s  swi^ Jndex -t- 2; 

TailLptr = &otdeittaiUndex)m;  //point  to  next  node  to  dieckccmdition  again 

T_MOVEsl;  //set  flag  to  indicale  tail  ptr  was  moved 

swq>.index-H'; 


// swiq)  Tail_ptr  and  CDiT4iCr  to  pot  tafl  node  in  position  after  head  node 
if  (T>10VE&&TaiLpir->nodeJd  l^NULL) 
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‘IEMP_>KH>Ejiode Jd  »  Currj)tr-5«xlc Jd; 
lEMPJ<fODEiiistr_size  s  Ciifr_ptr->instrjnxe: 

TEMPJ<fODE.setupjtime  «  CiinLptr->«etnp_time: 

TEMP_NQDExxe_tuiie  »  Ciin-_p(r->exB.tiiiie; 

TEMP J^ODEbseakdowoLtune  »  Corr j)tr->breakdowiutime; 
lEMPJSfODEproc.w  *  Cuir_plr->pfOC_type; 

'IEMPJ«K)DEitait.tiiii6  s  Cnir_ptr->stan_tiine; 

Carr_plr->iiode Jd  s  TuLptr-^iodeJd: 

Curr_plr->4iistr_ai>e  »  TaiLlHr*>4iistr_9ze: 

CanLptr->8eti^>.tiine  s  TaiLptr->fleti^_tiine; 

Qiir_ptr->exe.tiiiie  s  TaiLptr>>ei6jtiine: 

C^_ptr->bfeakdowiL.time  » 

Cuir_ptr-dffOCj^pe  *  TaU_ptr->pioc.Qfpe; 

CiiiT_j»r->starUime  »  T«iLptr->«tai|_ta^ 

TaiLptr-XMxfe Jd  »  TEMP .NQDEnode Jd; 

TaiLptF^>mstrjsize  «  TEMP J<K)DEiiistrjriaB; 

TaiLpb‘->aetiipJiiiie  s  TEMPJIODEaed^.tinie; 

TaiLp(r->exejtiiiie  »  TEMP.NODEeaKejdiiie: 

TaiUMr->bmjatowiutu^  s  TEMPJ^ODEbieakdowiutime; 
TaiU>tr->pfoc_type  »  TEMP J^ODEpioc.^; 

TiiLptr->statUtime  >  TEMP.JKKDE.tfiiUime; 

SWAFPEDsl;  //set  fl«g  to  indicate  nodes  swapped 

Oitr_ptr->staxtJime = (HeaiLp*^>«biP-tirae + HeadLpir->siaflLtime); 

// Nodes  were  not  swi^^ied.  only  two  nodes  in  array 

if(nodejndex3s;0)  //bode  is  head  ptr,  pot  node  in  new  array 

oider.ouqralJBle  «  ilead_ptr->iiode Jd  «  8etw(8) « Iiead_ptr- 
>stait_time  «  newin; 

nBw_oider[nodeJndex]|j]  jKxleJd  s  Head_pir->nodeJd; 
new.oideif node Jndex][fl  Jnstrjtize  s  HeadLptr->in8trjBze; 
newjoidef[node Jndexim  jetiip_|ime  s  IieadLptr'>seti^_time; 
new.ofder[nodeJndex]Q.exe.tnne  s  IieadLplr->exejtime; 
new.oideif nodeJndex]lj].breakdowiL.tiine  s  Head_pcr- 
>breakdown_tune; 

new.oidei[nodeJn(hat](3.piocjQfpe  s  Head_ptr->pn)cj^; 
newjoider[nodeJndex]Q.starL.tinie  -  Head_ptr->MBitJinie; 


//Put  node  after  head  node  into  Older  foe  and  array 
order^outpoL^  «  Ciiru>tr->iiode Jd  «  8etw(8)  «  Cair_ptr->stiruiiiie  « 

newin; 

new.otderfnodejndex-i-lin]  Jiodejd  ~  Cmjtr->aodeJd; 
new_acdec[nodeJndex'i'l]minstr_^  »  Ciar_ptr->jnstrjnze; 
new_oider(nodejndex-t'l]0]<aet(9.tinie  s  CDfT_ptr->8eti^_time; 
new_ardei(nodeJndex<f  l]ri].exejdme  »  CmLptr->e3UL.tiine; 
new.ocder[nodejDdex-i'l]Q].breakdown.tnne  ^  CinLptr->breakdowiL.tinie; 
new_ocder(nodeJndex+l]|j].pioc.Qrpe  «  QiiT_ptr->procjQrpe; 
new_anler(nodeJndex-»>l]Ol  Jttruthne  «  Ciiir_ptr->siafUime; 
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} 

if  ( Tail_ptr->node_id  !=  NULL  &&  ISWAPPED )  //Nodes  were  not  swapped, 

{  //no  node  matches  requirement 

//so  sked  node  after  head  node 

Tail_ptr->start_time  =  (Head_ptr->8ettq>.time  +  Head_ptr->stait_time); 

if  ( node.index  =  0 ) 

( 

order.ouq)ut_fi]e  « Ifcad_ptr->node_id  «  setw(8) «  Head_ptr- 
>start_time  «  newln; 

new_ordeilnode_index]Ij]  Jiode.id  =  Head_ptr->node_id; 
new_order(node_index]Oj  Jnstr.size  =  Head_ptr->instr_sizB; 
new_oider[node_index]Q],setiq)jime  =  Head_ptr->setup_time; 
new_order[nodejLndex]^xxe_.time  =  Head_ptr->exe_time; 
new_ordet(node_index]0}.breakdown_tiiiie  =  Head_ptr- 
>breakdown_time; 

new_ordeitnode_index][j].proc_type  =  Head_ptr-^m)c_9pe; 
new_order[node_index]Q),8tait_time  =  Head_ptr‘>5taiL.tin)c; 

} 

order.output_file  «  Tail_ptr->node Jd  «  setw(8)  «  TaiLi>tr->stailLtime  « 

newln; 

new_oideifnode.index-t-l](j].nodeJd  s  Tail_ptr->node_id; 
new_order[node_index+l][j]instr_^^  s  Tanj>tr*>instr_sue; 
newjordeifnodeJndex+l][}],setiq>_time  s  Tail_ptr->setup_time; 
new_order[node.index'fl][n.exejtime  s  Tailjplr->exe_time; 
new.onler[node.index-f  l](il.breakdownL.tiine  s  Tail_ptr*>breakdowp_time; 
new.oider[node.index<fl][|].pn)Cj^  -  TaiLptr->i»oc«type; 
new.ordatnodeJndex+l]0].staiL.tiine  s  TaiLl»tr->stait_time; 

else  //last  node  to  be  scheduled 

{ 

if  ( CuiT_ptr->aode_id  !=  NULL  &&  ISWAFFED)  //sked  last  node  in  die  array 

Curr_ptr->stail;_time  =  (Head_ptr->seti9.tune  +  Head_ptr->5tarLtime); 
order.ouqnit.^  «  Ciirr_ptr->nodeJd  «  setw(8) «  Curr jitr- 
>stait_time  «  newln; 

new_orde((node_^dex-f  l](i]  Jiodejd  =  Corr_ptr->node_id; 
new_ordet(node_index-(-l](j]  jnstr_sire  =  Curr_ptr->instr_ia2e; 
new_ordei[node.index-f  l][]],setiq>_tinie  s  Curr_ptr->5etiq>_tune; 
new_ofder(nodeJndex-i-li^xxe.tinie  =  Cuir_ptr->exe_tinie; 
new_ordei(node_index-f  l]^.breakdown_tiine  =  CuiT_ptr- 
>breakdown_time; 

new_(xder[node_index+l]Q].inoc_type  =  Curr_ptr->procj^pe; 
new_order[nodeJndex-f  l]in,starL.tinie  s  Cunr_ptr->5tart_time; 

}  * 

nodeJndex-H-; 

Head_ptr = &oidei[nodeJndex][j]; 

TaiLptr = &oider[nodeJndex-«-l]|^; 

Currjitr  =  TaiLptn 

} 
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) 

nodc.iiViK.file.c]o8eO; 

onler_oiitpiit.^.ck)«^^ 

) 

//Calcuhtteunuaedexecutioocydes 

void 

iiodejiIk)c::cak.uniisedLe]iejcyclesO 

node.type  *Head_ptr,  *TaU4Mr,  *Coir_ptr, 
kng  unuaedLexe jcydes  »  0; 
long  lotiLainisedLexejcycfes  s  0; 

for  (int  j  s  0;  j  <  immber.<if_|MX)oesaoc8;  ) 

intnodeJndexsO; 

HeadLptr  »  db>ew_ofdei[iiodejiidex]y]; 

TuLptr  s  &iiew_ofder[iiode Jndex-t-ljQI; 

Cmiptr  s  TaiLptr; 

imuaedjBxejcydes  Iieadi.jMr->fletq>.tiine; 
while  (  TaiU>tr->iiode Jd !«  NULL ) 
int  swiq) Jndex  s  node Jndex; 
if  ( Head_pir->exe.time  <  TaiLptr->«iMD9.time ) 

umiaedjBXBjcycles  4e  T8iLl>tr->tetiq>Ltime  -  IiBad_jMr->enLtime; 


if  (  Head,jMr->hiealalowiLJtinie  >  TaiLphr->exe_.time ) 

im  taiUndex  s  swin jndex  2; 

TaiLptr  s  ftaider{taujndex][|]; 

imiisedjBxe.CTdes  Iiead_ptr'->taeikdowq_time  •  CiiiT_ptr->exe jdme 

TnLptr->9etiq>.time  CWjMr->bfeiUowiL.time; 
nodeJindex*H>; 

) 

nodeJndex-H-; 

HeMLptr  s  &iiBw„oidei(nodeJndexm]; 

TuLptr  B  &new_o(der[nodejndex-i'l]^; 

Cnniptr  s  TaiLptr; 


if  ( TaiLptr->iiode Jd  «  NULL  AA  HeadLptr->iiode Jd !»  NULL ) 

{  //last node,  add  breakdown 

imnsedjBxejcydes  HeadLpOv>brealcdowiutinie; 


totaLunuKdLexe.cydes  4«  unuedjexejcycks; 
omuedjBxe jcydes  »  0; 
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) 

cottt «  Total  Unused  execution  cycles  are:  ” « total_unused_exe_cycles  « "  cycles  ” 
«  newln; 

} 


//  Function  to  implement  the  wrap-around 
void 

node_alloc::wrap_aioundO 

{ 

// Initialize  values 

long  LaigesLcyLtune  »  0,  totaLsHcejimes  s  0; 
lopg  idle jexe.cydes  =  0,  blockBd_exe_cycles = 0, 
long  idle.ctiLcycles  =  0.  UockBd.ctiLcycles  s  0; 
long  blockBd_proc_.cttl  s  0,  blockedLproc jne  =  0; 
long  idle_proc_exe  s  0,  idk_pfOC_cm  s  0; 
long  totaLidle_proc_exe  ~  0,  totaUdle_proc_clil  =  0; 

Itmg  total J)lodcBd_pfOc_exB  =  0,  totaLblodcBd_pfOc_ctil  =  0; 
long  exe.j>acking  sO,  ctrl4Mcking  sO; 


node.type  *Head_ptr,  TaiLptr,  *Nexcptr; 

oSstream  cylIime_oaqpuL^; 
wlTImejoutpucfile-opcnCCYLTIMES.OUT’); 
if  ( !cylI1me_output_^ ) 

ceir  « "\nCannot  Open  file  CyLTIMES.OUT\n"; 
exitCO); 

} 


ofstream  sUce.outpuUEile; 
slioe_ouQ)ut_^.open("slice_time.oat"); 
if  ( isUce.ouqmt.^  ) 

ceir  « "\nCannot  Open  file  slioe_fime.oufin”; 
exit(0); 

} 


// calculate  the  number  of  nodes  on  each  processor 
fot(int  k  s  0;  k  <  number_of_processots;  k-H-) 
intindexsO; 

Head_j>tr  =  &new.order[index][k]; 
taiule  ( Head_ptr->node Jd  !=  NULL  ) 
index-H-; 

Ifead_ptr  s  &new_(mier[mdex][k]; 
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) 


new_order(0][k].nodes_per_i»ocessor  s  index; 


forCint  j  »  0;  j  <  number.ofjxocessois;  >h-) 

HeadLpn- s  &new.(MdeifO]|j]:  /^xnnt  to  first  mxle 
Head_ptF->staiL.time  » 

int  numNodes  »  Head_ptr-^odes.j)er_processor, 

long  cyLtune  s  Head_pir->9etiq>jdine; 
ldodced_|mxLCtrl  s  0; 
blodoed_proc.exe  s  O; 

int  FLAG  »  0.  STUFFED  »  0.  PUSHED  »  0; 


tf(numNodes  =  1)  //Only  one  node  on  processor 

cyLtime  (Head_ptr->exejthne  *i-  Head_ptr->bfeakdowiutinie); 

tf(Head_ptr->breakdown_time  +  Head^->setiip.time  <  Head_ptr->exe_time) 

Ifead_ptr->end.tiine  a  cyLtime  -  Headjptr->breakdowp_.tiine; 
cyLtime  Hddjptr->brcakdowitjtime; 

else 

{ 

Ifead_ptr->end_time  a  cyLtime; 

LaigesL.cyLtime  a  Head_ptr->end jdme; 


for(int  i  a  i;i<  numNodes;  144^)  //M(»etiian  one  node 

Tail_ptra&new_ordet[Qm;  //{x^  to  next  node 
Tail_ptr->startjdme  a  c^time; 

NexLptr  a  &new_ocder[i4l](j]; 

//Several  conditkms  are  possible  modify  die  ws^  the  blocked  cycles  are  cakndaiBd 

//Conditian  1 

if(PUSHED  &&  N6xLptr->node_id  !a  0)  //PUSHED  a>A  node's  bieakdown 
{  //was  greater  than  anotiier  node's 

Headjptra&ncw_orderp]|j];  //exettme 

Tail_ptr  a  &new_ordeip4l][g; 

++i; 

Nex^Mr  a  &new_ofderp4l](j]; 
c^time  4a  Head_ptr->seti9.time; 

TaiLptr->staiLtime  a  cyLtime; 

PUSHED  a  0; 
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) 


//Coiiditioo2 

if  (.'FLAG  &&  iPUSHED)  //FLAGs> 

{ 

if  (Head_ptr->exe_tiine  >  TaiLptr>>seti^>_time) 
cyLtime  ■»«  Head_j)tr->e»jtinie; 

faiockBd.ctriLcycks  4«s  IieadLptr->exejiine  -  TaiU[Mr->setiq>_tiiiie; 
blocked_j»ocjctfl  4«s:  HeadLptr->exe_tiiiie  -  TaUj>ir->8etiq)_tiiiie; 

else 

{ 

c^time  -fe  Tail_ptr>>seti^ jdnie; 

UodcedjexB.cycles  •»«=  TaLLptr-»etiq>_tiine  •  Head_ptr’>exe_tiiiie; 
blocked_pfoc_exe  4<s  Tail_ptr->fletiip.tinBe  -  IfeiidUptr->exe_time; 

1  ^ 

//Cmidiiioo  3  _ 

if(!STUFFQ>  &&  IPUSHED) //STUFFEDb>  breakdown  of  a  node  and  aehq)  of 
{  //next  node  occur  during  exe  of  another  node 

Head_i>tr->end.tune  =  cyLtime  +  Head_pir->lMea]^wn_time; 


//Condidond 

if(Ifead4)tr>>breakdowiL.tinie  <  TaiLptr->«te_time  &&  iPUSHED) 

cyljtime4eTafl_ptr->exe  time; 

FLAGsO; 

TaiLptr'->^.time  s  cyLtime  •i'  TaiLptr->breakdowitJime; 


//Oxiditioa  5 


if((Head_ptr->breakdowi|_time  +  NexLptr->setim  time)  <  TaiLptr->exe_time 
&&  Next_^>>nodeJd  is  0) 

FLAG  si; 

STUFFED  s  1; 

NexLptr->statLtime  s  cyLtime  -  TaiLpcr->exe_time + TaiLptr- 
>brei&downLtime; 

blodoed jctiLcydes  4s  TaiLptr->exe_time  -  Head  j>tr->breakdowit.time  - 
Nextj)tr->setiq>_time; 

bkKkBd_pcDCjctri  4s  TaiLptr->exe .time  -  Ifead_ptr>>breakdowit_time  - 
NexLptr->8etiq>_time; 

else 


^[NexLptr-dnode.id  isO) 


STUFFED  s  1; 

FLAG  si; 

NmcLptr->staiLtime  s  cyLtime  -  TaiLptr->exe_time  4-  TaiLptr- 
>breakdownjime; 
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blocked.exe.cycles  4«  Head jMr->lHeakdowiutinie  +  NcjcLptr- 
>setiq>jtiiiie  -  TaiLpti'->cxe_dme; 

blockedLproc.exe  +=  Head_ptr->breakdowi)L.tiiiie  +  NexLjwr- 
>setiq>_tiiiie  -  TaU_ptr->exe_tinie; 

-Kb  blodDedjMoc.exe; 

TaiLl>tr->en(Ltinie  4c:  blodced_proc_exe; 


} 

} 

) 

elae 

{ 

if(Next_ptr->iiode_id  ss  0  &&.  PUSHED) 

{ 

cyLdme  4=  (TaU_ptr->8etup_time  4  Tajlj)tr->exe_time); 

Tail_^*>endjiine = cyljtiiiie  4  TaiLp(r->bfeakdownjdme; 
cyljiine  s  TaU_ptr->ei^_tiiiie; 

) 

else 

{ 

if(!PUSHED) 

{ 

PUSHED  si; 

cyLtime  4c:  Head_ptr->brBalak)wi^tiiK 

TaiL^->eiid June  s  c^time  4  Tailj)U*>biea]alowiL.time; 

cyOinie  »  TaiLptF->ei^ jime; 

blodcedjBxej^ycles  4s:  Headjxr->biealDdowiutiine  -  TajLptr-x^e.tune; 
blodDedjvocjexe  4s:  Head j>tr’>bieakdowA_.tin)e  -  TaiLplr->exe.time; 

)  ^ 

} 

Head_ptr  s  &newjinkr[i][j]; 


ifCnoniNodes  !=  1  &&  !PUSHED) 

{ 

cyLtime  4cs  Tail j)tr->breakdafwpjime; 

} 


//  Chedc  for  wn^around  cooditioo 
Headj>tr  s  dl^wj)rder[0][j];  //  Pdnt  to  fost  node  on  processor 
NexLj>trs&iiew_oiider[l](j];  //Poiitt  to  second  node 

if(CraiLptr->bieakdowit_time  4  Head_ptr->biealGdowiL.tiine  4  Nextj>tr- 
>setup Jime)  <s  (Head_ptr->exe Jime  4  Next_ptr->e«_time)  &&  Ifoad jMr- 
>nodesj)erjnocessor  !s  1  &&  ISTUFFED  &A  ’PUSHED) 

{ 

TaU^->end Jime  s  Ifead jNr->setiq> Jime  4  Ne]cLptr*>8etiq> Jime  4  TaiLptr- 
>bieakdownjime; 

cyLtime  -s  Tail jrtr->bieakdowp_time; 
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>cxc_timc) 


if  (Next_ptr->setup_timc  +  Tail_ptr->breakdown_timc  <  Hcad_ptr- 

{ 

Hcad_j)tr->cnd_time  =  He!ad_ptr->sct»q)_tinic  +  Hcad_ptr->cxc_timc  + 
Head_ptr->breakdown_time; 

blocked.ctil.cycles  -=  Tail_ptr->breakdown_timc; 
blocked_proc_ctri  -=  Tail_j>tr->breakdown_timc; 

else 

Head^tr->end_time = TaUjptr->end_time  +  Head_ptr->bteakdown_time; 

else  //STUFFED  or  PUSHED 

{ 

if(STUFFED) 

{ 

nodc_type  ♦Third_ptr  =  &iiew_oidef(2][j]; 

if  ((Tail_ptr->bieakdown_time  +  Head_ptr->bieakdown_time  +  Ne3rt_^tr- 
>setup_tiiiie  +  Thiid_ptr->seti:i|)_tinv‘  <s  Head_ptr->exe_time  +  NexLptr->exe_tiiiie)  && 
Head_ptr->nodes_per_processor  Is  1) 

{ 

Tail_ptF->eiKLtiine  =  Head_ptr->setup_tiiDe  +  Next j>tr->setup_time  + 
Tail_ptr->breakdown_tiine; 

cyljdme  -=  TatLptr->b(eakdown.tiine; 

if  (Next  Dtr->setup  time  +  TaiLptr->toeakdown_time  <  Head_ptr- 

>exe.tiiiie) 

{ 

Head_ptr>>end_time  s  Head_ptr->setiq>_time  +  Head_jitr->exe_time  + 
Head_ptr->breakdowit_time; 

blockedjctriLcycles  -s  Tail_ptr->bieakdowit_tiiiie; 
blocked jxoc.cttl  -s  Tul._jm>>bieakdowiiL.time; 

else 

Head_ptr->end_time  s  Tail_ptr->eiid_time  +  Hea(i_ptr-'>bieakdown_tiiiie; 
Thinl_ptr->statt_time  =  Head_ptr->end_time; 

} 

} 

else 

{ 

if(PUSH^) 

{ 

if(Nextptr->setim  time + TaU_ptr->breakdown_time <s Head  otr- 

>exe_time) 

{ 

Tail_ptr->5nd_timc  =  Ifcad_ptr->semp_time  +  Nex^j)tr->setup_time  + 
TaiLptr->bteakdowiijtime; 

cyLtime -s  TaiLptr->bieakdown  time; 

} 

} 

) 

} 

to^slice.times  4=  cyl_time;  //  add  all  cylinder  times 
slice_ouq>ut.file  «  cyLtime  «  endl; 


48 


totaLbiocked_proc_ctrl  +=  blocfcedLprocjctri; 
totaLblockBd_proc_exe  +«  blocked_proc.exe; 

cyrnme_oulput_filc  «  j+1 «  setw(8) «  cyLtimc  «  endl; 

Headjptr  =s  &iiew.(»xler[0]|j]; 

exe_packing  =  0,  ctrLpackmg  =  0; 

for  ( int  p  s  0;  p  <  numNodes;  p-H- )  //Calculate  exe  and  control  unit  paddng 

Head_ptr  =  &iiew_OFder[p]|j]: 
exejpaddng  Head_ptr->exe_tinie; 

CtrLpackmg  Ifead_ptr->setup_time  -i-  Head_ptr->breakdown_time; 


//Calculate  idle  cycle  tunes 

idle_pioc_cxe = cyLtime  -  exe_j)ackmg  -  blocked_proc_exe; 
idl6_(^jctii  -  cyLtime  -  ctiLpadcmg  -  blocked_proc_ctil; 
total_idle_j)roc_exe  +=  idle_proc_exe; 
t(«al_idle_proc_clii  +=  idle j)roc_ctil; 

geiierate_processor_stats(j+l ,  cyLdme,  exe_packmg,  ctri_paddng,  idle_proc_exe, 
idle_proc_ctrl,  blocked_pn)c_exe,  blockedjnocjctrl); 

slice.ou^uLfiIe.closeO; 

) 

//  Hnd  the  laigest  end  time  for  all  processors  for  "flat”  cylinder 
for(int  m  =  0;  m  <  number_oLprooessors:  m-H-) 
int  index  =  0; 

Head_ptr  s  &new.order[index][m]; 
while  ( Head j)tr->nodeJd  !=  NULL  ) 
index4-f; 

if(Laigest_c^_tinie  <  Head_ptr->end_time) 

LaigBst_cyLtime  =  Head_ptr->end_time; 
cynime_ou^ut_^  «  endl «  endl «  LargesLcyLtime  «  endl; 

Head_ptr  s  &new_order[index][m]; 


//  Rnd  idle  time  for  "flat"  cylintter 

idle.ctiLcycles  =  (LaigesLcyLtinie*number_of_processors)  -  blodted_ctrl_cyclfs; 
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uUcjexe.cydes  =  (Lar£est_cyl_time*number_of_processois)  - 
UockedjBxejcycles; 


makejcyliiKler_^(Lai]ges^cyl_.tiiiie); 

generaie_siatistks(I^est_cyljime4dle_ctiLcycl^  blocked.ctiijcycles, 
kUejexe.c^les,  UodDedjoccjcycle^  total_idle_proc_exe,  total_idlcLproc_ctii. 
touLslioe.tiiiies); 

cymme_oii4>uLjGle.closeO: 


//Fimctioii  to  create  tfie  cylinder  output  file 
void 

node.a]]oc::make_cylinder_file(l<ng  Lai]gest_cyLtime) 


node.type  *Head_ptr, 

ofstream  cylinder.outputjfile; 
^linder_ou4)ut_;fite.open("cylinder.out”); 
if  ( !cylinder_outputJ^ ) 

cerr  « "VnCannot  Open  file  cylinder.oufin"; 
cxit(0); 

) 

for  ( intj  sO;  j  <nomber.oLprocessors;  j4-f )  //Print  out  node  order 


Head_ptr = &new_ordeifO][j]; 

int  processoiNodes  s  Head_ptr->nodes_per_processor; 

cylinder_ouQ)ut_file  «  newln  «  processtnNodes  «  endl «  endl; 

for  ( int  i  s  0;  i  <  processoiNodes;  i-H- ) 

Head_ptr  s  &new_order[i][j]; 
cylinder_ouq)ot_fi]e  «  s«Cw(7)  «  Head^tr-Miode.id; 
cyliiider_outpuLfile  «  setw(12) «  Head_ptr->start_time; 
^linder.ouqnitL^  «  setw(12)  «  Head_ptr>>end_tinie; 
c^inder.output.^  «  aidU 

}  ^ 

cylinderjouqnit.^  «  newln  «  newln  «  LaisesucyLtinie  «  endl; 
cylinder_output_file.closeO; 


//  Function  to  print  individual  processor  statistics 
void 
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node.alloc::genente_ptocessor_stats(mt  (nocNum,  long  cyLtime,  long  exejMcking,  long 
ctiLpacking,  l(»g  idlej>roc_exe,  long  idle.jvoc.ctri,  1^  Uockedj)roc.exe,  long 
Uockedjvoc.ctri) 

{ 


ofistream  prooessor.stats.file; 
prooessor.stats.file.qpen("i»oc.stats.out'',  iosnai^); 
u  ( !process(v.stats.:^  ) 

{ 

oerr  « "VnCannot  Open  file  proc  jtats.oufin”; 
exit(0); 

} 

piocessor.stats.fik  « I^OCESSCXt  UTILIZATION\n>n*; 
processor.stats.fik  « "NUMBER  mOCESSORS : 
prooessorZstats.fik  «  setw(4) «  number.of jnrocessors  «  endl «  endl; 
processorj5tats.fik  « "CYCLES  PER  WC^D  : "; 
prooesstv.stats.fik  «  setw(4)  «  cranm  «  endl «  endl; 


doubk  ctri.iitil.cate  =  (doubk)ctrijiacking/'c^.time*100.0; 
doubk  cttLidkjate  -  (doubk)idkj>roc.ctri/cyLt>me*10Q.O; 
doubk  cnl.blockBdjtate  =  (doubk)blodEBdj»oc.ctri/cyLtinie*100.0; 

douUe  exe.utiLnue  =  (doabkkxejMiddng/cyLtime*  100.0; 
douUe  esejdkjate  %  (doubk)idkj»oc.exe/cyLtinie*100.0; 
doubk  exe.blodDed.rate  s  (doubk)blockedj>roc.exe/cyLtime*100.0; 

processor.statsjfik  « "PROCESSOR  NUMBER  :"; 
processor.stats.fik  «  setw(4) «  ptocNum  «  endl; 

processOTjstats.fik  « "VoCONTRCX^  UNIT  UHLJZATIONVnVn"; 

prooessor.statsjSk.setf(ios::fixed); 

processor.stats.fik.setf(ios::showpoint); 

ptocessor.stats.fik  «  "BEST  CnJNDER  PACKING  ( CONTROL  TIME )  : "; 
pfOoessor.stats_^  «  setw(12)  «  ctrij>acking«endl «  endl; 
process(v.stats_fik  «  "END  TIME  OF  ACnVl'iJES  : "; 

prooessorjstats...fik  «  setw(12) «  cyLtime  «  endl «  endl; 
processorjtats.fik  «  "Control  Unit  Utilizatkm  Rate  : "; 
prooessor.statsjBk  «  setw(6) «  se^nedsicMiCl); 
piooessor.jStalsjBk  «  coLufiLnik  «  "%\n"; 
prooessorjstats.fik  «  "Control  Unit  Idk  Rate  : "; 

inooes8or.stats.fik  «  setw(6)  «  setprecisitmCl); 
prooessor_atats.fik  «  ctrU^-iate  «  "%Vn”; 
prooessorjstatsjfik  « "Omtrbl  Unit  Blodo^  Rate  :  ”; 
piooessor.stats_^  «  setw(6) «  seqnocisioo(l); 
prooessor.stats_fik  «  ctri.blodDedj^  «  "%\n\D\n\n\n": 

processorj8tats.fik  «  "EXECUTKH*!  UNIT  UTILIZATlONVnNn"; 

processor.stats.fik  « "BEST  CYLINDER  PACKING  ( EXECUTION  TIME )  : 
piooessorjstats.fik  «  8etw(12) «  exe jiaddng  «  en^  «  endl; 
prooessor.stats.fik«  "END  TIME  CX^ACnVinES  ;"; 
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pioce8Sor_statsjfile  «  setw(12) «  cyCtime  «  endl «  endl; 
prooessor_stats_file  «  "Exeradoo  Unit  UtUizadon  Rate  : 
processor_stats_file  «  setw(6) «  setprccisioDCl): 
processor_statSL^  «  exe.udLnie  « "%\n*; 
processor_stats_fi]e  « "Execution  Unit  Idle  Rate  : 
processor.statsjBle  «  setw(6) «  seqprecision(l); 
processor_statsjfile  «  exejidki_tate  «  *%Vn"; 
processor_stats_^  « "Execution  Unit  Blodcage  Rate  : 
processor_stats_Ele  «  setw(6) «  seq»ecision(l); 
processor_stats.jae  «  exe.blockBd_iate  « "%\n\n\n"; 
processor_stats_Ble  «  endl; 


) 


//  Function  to  print  cylinder  statistics 
void 

node_alloc::geneiate_statistics(long  Laijestjc^time,  long  idlejctiljcycles,  long 
blockedjctiljcycles,  long  idte.exe.cycles,  long  blocl^_exe_c^les,  long 
totaUdle_proc_exe,  long  totaLidle_pfocjctil«  long  totaLsIice_times) 

Icmg  exe.cyl^jtad^  =  0; 
long  ctrLcyLpacldiig  =  0; 

node.type  *Head_ptr, 

ofstteam  statistics_ou9ut_Blc; 
statistics_ou4)ut_^.openrcyLstats.out"); 
if  ( !statistics_outputJ^ ) 

cerr « "\nCannot  Open  file  cyljstats.outin*; 
exit(0); 

} 

statistics.oo4)otjBle  «  "PROCESSCXl  UTILlZA'nONVnVn"; 
statistics_oo^ot_file  «  "NUMBER  OF  PROCESSORS : 
statistics.ouqniLfile  «  setw(4) «  nomber_cKP_jm)oessois  «  endl «  endl; 
statistics_ootput_file  « "CYCLES  PER  WORD  : "; 
statistics_ooq>ot_file  «  8etw(4) «  comm  «  endl «  endl; 


forfint  j  s  0;  j  <  number.of._processois;  jf4-) 

Head_ptr = &new_oider[0][j]; 

int  processoiNodes  s  Head_ptr--^4iodes_per_i»ocessor, 

f(n'(mtisO;i<pnxxssoiNodes;frH-)  //Calculate  exe  and  ctii  unit  packing  per 
{  /^tfocessor 

ifead_ptr  s  &new.orderp](j]; 
exejcyLpad^  -kb  Head_ptr->exe_time; 

ctrLc^-paddog  •¥si  Head j)tr->8ettq>Jime  -t-  Head_ptr->breakdown_.time; 
//Calculate  values  for  "jagged"  c^inder 
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long  avg_slice_time  =  totaLslice_tiines/number_.of_prooessors; 
long  beslLexe_pacldng  =  exe.cyl_packing/number_of_process(ns; 
l(Hig  best_ctrl^acking  =  ctri_cyl^ackin^umber_of_processors; 
long  avg_ctri_idle  =  (idte_ctri_cycles/number_af._prooess(xs)  -  be^ctii_paddng; 
long  avg_ctri_blockBd  =  blocked_ctii_cycles/number_of_piocesscns; 
long  avgjexe_idle  s  (i(l]e_exe_cycles/number_af jprocess(»s)  -  best_exe_packing; 
long  avg_exe_blodced  =  blodced_exe_cyclesMumber_of_j[>rooessots; 
long  avg_ctiljidle Jag  =  (total_idte_proc_ctri/number_of_processors)  - 
besLctrijMddng; 
if(avgjctrl JdleJag  <  0) 
avgjctri^idle Jag  =  0; 

long  avgjBxe JdleJag  ^  (totaLidle_proc_exe/nun)ber_of jnocessois)  - 
besiLexe_pack3ng; 
if(avg_cxc JdleJag  <  0) 
avg_e*e JdleJag  =  0; 


//Cakulate  "flat"  and  "jagged"  cylinder  statistics 
double  exe.utiljraiB  s  (double)besL.exc_j>addng/LaigesLcyLtime*  100.0; 
double  ctil_utiLtate  s  (doubte)best_ctitj»clring/I.argesL.cyLtuPC*100.0; 
double  coiJdlejtalB  s  (doubk)avg_ctiOl]e/LiuqBesL.c^time*100.0; 
double  ctrLblocked_xalB  =  (doubte)avgjcaiLMocted/Largest_cyi Jime*  100.0; 
double  exejdle_raie  s  (double)avg_exBjdle^Laigest_cyLtinie*100.0; 
double  exeJ)lockedj[^  s  (doubfe)avgjaceJ)lodced/L{ugest_c^time*  100.0; 
double  ctil.idle_ralBjng  s  (doubkkvgjctiUdle..^avg_slice_.time*100.0; 
double  ctri.util_rate_jag  s  (doubk)bestLCtxij)adcing(avg_slioe.time*  100.0; 
double  ctil_blocked_ratBjsg  =  (doubk)avgjctrLblockBd/av^slioe_tune*100.0; 
double  exe.udljrateJi^  s  (doubk)besLen_packmg/avgLslioe.time*100.0; 
double  exejdlejcatejag  s  (doubk)avgjneJdleJa^avgjslioe_tinie*100.0; 
double  exe.blockedjcatejag  =  ((toubte)avgjexe.bloc]cBd/avg_slice_.tinie*100.0; 


statistics.outputjile  «  "VnVn\nCONIROL  UNIT  UTILIZATION\n\n”; 

statistics_outpuL^le.setf(ios::£bced) 
statistics_ou^t_^.setfi(ios::showpoint);  _ 

statistics.ou^tLfile  «  "BEST  O^iNDER  PACKING  ( CONIRfX.  TIME )  : "; 

statistics_ou4>utLfile  «  setw(12) «  bestjctrLpadang  «  endl «  endl; 

statistics_ouqnit_£ile  «  "END  TIME  OF  ACnVlTIES  (FLAT  CYLINDER)  : "; 

statistics_outputjB]e  «  setw(12) «  Lain^cyLtinie  «  endl «  endl; 

statistics_output.file  « "Control  Unit  Uroization  Rate  : "; 

statistics_outi)ut_file  «  setw(6) «  seQ»ecision(l); 

statistics.outputjfile  «  ctrijit^rak  «  "%\n"; 

statistics_outpu^/ile  «  "Control  Unit  Idle  Rate  : "; 

stadsdcs.ouQxitjfile  «  setw(6) «  setpitcision(l); 

statistics_ou9utjB]e  «  ctdjdle_tate  «  ”%Vn"; 

statisdcs.ou^ut Jile  «  "Control  Unit  Blodk^e  Rate  : "; 

statistics_ouii>ut_file  «  setw(6) «  seqnecisionfl); 

statistics_output_Ble  «  ctilJ)lockBd_j^  «  "%\n)ai\n"; 

statistics_ou4>ut_Ble  «  "END  TIME  OF  ACnVlTIES  CJAGGED'  CYLINDER): " 

statistics_ou^ut_Bk  «  setw(12) «  aygjdicejime  «  endl «  endl; 

statisticsjoutpuL^  «  "Control  Unit  Utilization  Rate  : "; 

statistics.ou^ulJUe  «  setw(6)  «  setpreciaon(l); 

statistics_out^t-fik  «  cttLutiljrale Jag  «  "%\n"; 

statistics.ou^tjtile  «  "Omtrol  Unit  I&  Rate  : "; 
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stadstics.outpucfile  «  setw(6) «  aetprecukMi(l); 
statistics_ou^t-file  «  ctriLidk.jate jitt  « 
statistics_ou4>utjGle  «  "Onitrol  Unit  Blodeage  Rale  : 
statistics_ouq)idLfi]e  «  setw(6) «  seqxncisioaCl); 
statistics_oii^^fi]e  «  ctriLblodtBdLraleJtg  « ”%\n\nVn\n\n\n''; 


stadsdcs.ou^uCfile  « "EXECUTION  UNIT  UHLIZA’nONNnVn*; 

statistics_outpuL01e  « "BEST  CYLINDER  PACKING  ( EXECUTION  TIME )  : " 

stadstics.ouqntL^  «  setw(12) «  be8Lexe_packing  «  endl «  endl; 

statistics_outpuL^  «  "END  TIME  OF  ACiWniES  CFLAT  CYLINDER)  : "; 

statistics_oiitpiiL.fi]e  «  setw(12) «  LaigesLPyUunc  «  endl «  endl; 

statistics.outpaL.file  «  "Executioo  Unit  Utfliztfion  Rale  : "; 

statistics.ou^NiLfile  «  setw(6) «  setixecisi(»(l); 

statistics.outpuLfile  «  exe.udUale  «  "%\n"; 

statistics.outpuLfile  «  "Execution  Unitldle  Rale  : "; 

statisdcs.oaqMiLfile  «  setw(6) «  seqxecisionCl); 

statistics_ou^aiLGle  «  exe JdkLjaie  « "%\n"; 

statistics_oo4)uLfile  «  "Execution  Unit  Blodeage  Rale  : "; 

statistics_ou4N]tLfile  «  8etw(6)  «  seqxedsitnCl); 

stati8tics_ootput_file  «  exe_bk)ckBd_tale  «  "%\n^Vn"; 

statistics_onqnitjGle  «  "END  TIME  OF  ACllVniHS  (TAGGED*  CYLINDER): "; 

statistics_ou4niL.file  «  setw(12) «  a^slioe.time  «  endl «  endl; 

statisdcs.ouqHiLfile  «  "Executiaii  Unit  Utilization  Rale  : "; 

statisdcs_on4nitj61e  «  setw(6)  «  8e^|»edaioii(l); 

statistics.oatpuLfile  «  exe.utUjaiejag  « "%Vi"; 

staiistics.oaQiatjfile<<  "Execution  Unit  Idle  Rale  :"; 

statistks_oa4Nit.file  «  setw(6) «  setmecision(l); 

statistics_ootpaLfile  «  exB3dle.iate_jag  «  "%\n"; 

statistics.oaqniLfile  «  "Exeentum  Unit  Blodeage  Rale  : "; 

statistics.oaQniLfile  «  setw(6) «  setpredsicmCl); 

statistics.0a9aL.file  «  exe.bloclaed.iateJag  « ”%\n\n*; 

statistics.ouQMitJ  « flush; 


} 


// Function  to  reorder  die  (XUTERJN  file  sequentially 

// Ibis  function  uses  a  simple  bubble  algochfam  to  reorder  the  ORDERIN  file 

void 

node.alloc;aequenoejnodesO 

^  inti, SWAPPED; 
order JiLtype  TEMP_NCX>E; 

i&tream  ordmJnpuLfile; 
orderJi9oLfile.(9en("CXtDBLIN"); 


i&tream  node JqmLfile; 
nodeJiqHiLfile.(9enCNCX)ESJN"); 
node JiqMiLfile  »  number.of.nod^ 
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no(le.iiqHit.file.closeO: 


ofstteam  order.outputL^: 

order.Ottq)utjfile.(^)en(”TEMP_(XU)ERJN*); 

for  ( int  count  s  0;  count  <  number.oCjQodes;  counH-t> ) 

{ 

order JnpuL^  »  CXU^R(coont].nodeJd; 
orderji^utjfile  »  ORDER[count].jtart.tinie; 


// Order  Nodes  in  order  of  increasiQg  start  time 

SWAPPED  si; 
while  (SWAPPED) 

{ 

SWAPPEDsO; 

for  ( i  s  0;  i  <  number.ofjtodes;  i-i-f  ) 

if  ( (XlDERn]*statt.tinie  >  (XU:£R(i4>l].staiUtiine )  //  Reorder  nodes 

{ 

TENfiLNCM}E  s  ORI^i]; 

Om^RTi] »  ORDERTh-I]; 

ORDERTi+l]  sTEMP_NODE; 

SWAPPED  si; 

) 

} 

) 

//Put  reordered  nodes  into  ou^ut  file 
for  ( i  s  0;  i  <B  number.ofjnodes;  i-H- ) 

{ 

if  (  ORDER[i]jiodeJd  !s  NULL  ) 

{ 

order.outpuLfile  «  (XU>ER[i]  Jiodejd  «  setw(8)  «  ”0"  «Dewin; 


order JimoiJIle^kMeO; 
orderjCMH)uUile.ck)seO; 
sysiemCmv  ORDERJN  ORraRJN.ORG*); 
systemCmv  lEMPJXiraRJN  (XtDERJN*); 
systemC’mv  NODES  JN  NCX>ES.SNB  JN*); 
sy8lBm(*inv  NOraS JN.ORO  NODES  JN*); 


) 

//end  of  program 
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APPENDK  B:  PROGRAM  USER'S  MANUAL 


L  NODE  SCHEDULING  PROGRAM 

This  section  describes  a  Lai;ge  Giain  Data  Flow  node-to-procnsor  scheduling  program 
(tefeired  to  as  SCHEDULE)  which  provides  a  detailed  node-to-prooessor  scheduling  of  a 
data  flow  gn^h  using  the  model  described  in  [Ref.  4].  The  program  uses  a  two 
dimensional  array  to  represent  the  revdving  cylinder  to  generate  the  (xder  die  nodes  should 
enter  the  system  based  on  input  data  files  and  data  provided  by  the  user.  The  program  also 
determines  if  the  breakdown  time  of  die  last  node  cm  a  imicessor  can  be  'wrapped-around' 
to  provide  an  accurate  modeling  of  the  revolving  cylinder.  This  mi^iiang  is  only  concerned 
widi  die  arithmetic  processors  and  the  program  nodes.  Therefore,  input  and  ouqwt  nodes 
and  the  input/output  processors  described  in  [Ref.  4]  are  not  included  in  this  scheduling 
program  or  associated  data  files.  This  program  must  be  tun  prior  to  executing  the  imqiping 
program  discussed  in  Section  IL  This  program  begins  execution  with  the  command 
'schedule*. 

A.  USER  INTERFACE 

The  following  inputs  and  options  are  availaUe  to  the  user 
1.  SCHEDULER  LATENCY  TIME 

A  number  triudi  abstracdy  represents  die  time  it  takes  die  scheduler  to  diange  the 
state  of  its  local  memory  adien  amounts  on  a  queue  ate  modified  due  to  node  iiqint  or 
oufout 
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2.  COMMUNICATION  TIME  FOR  ONE  WORD 

This  is  the  time  to  transmit  oac  y/ord  of  data  between  a  memory  module  and  a 
processor. 

B.  INPUT  FILES 

1 .  Inpot  FUe:  NODES.IN 

This  file  contains  the  initial  node  infoimatimiiequiied  for  nuq)piiilg.  Thenumber 
of  nodes  parametm*  is  an  individual  element  The  remaining  parameters  exist  far  each  node 
in  die  graph. 

a.  Number  of  Nodes 

This  is  the  total  number  of  nodes  in  die  data  flow  grafdt 

b.  Node  ID 

This  is  the  node  identifier  number. 

c.  Imetrue^u  Size 

This  is  die  node  instnicdcm  size  parameter  in  words. 

d.  Setup  Time 

Ihis  is  die  node  setiq)  time  in  cycles. 

e.  Execution  Time 

TUs  is  die  node  execotioo  time  parameter  in  cycles. 

/.  Breakdown  Time 

This  is  the  iKxfe  breakdown  tirne  paranieter  in  cycles. 
g.  Processor  Type 

Iliis  is  die  processor  number  diat  die  node  will  be  assigned  to. 

2.  Input  FUe:  QUEUESJN 

This  file  crmtains  die  initial  queue  information  requiied  for  scheduling.  The 
number  of  queues  parameter  is  an  individual  element  The  remaining  parameters  exist  fa 
each  queue  in  die  grqdt 
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a.  Number  of  Queuos 

This  is  the  total  number  of  queues  in  the  data  flow  grq>h. 

b.  Queue  ID 

This  is  the  queue  identifier  number. 

c.  Source  Node 

This  is  the  node  ID  for  the  node  at  die  tail  of  the  queue. 

d.  Siuk  Node 

Hiis  is  the  node  ID  for  die  node  at  the  head  of  the  queue. 

e.  Write  Amouut 

This  is  the  queue  wrhe  amount  parameter  in  wcMds. 

/.  Read  Amouut 

This  is  die  queue  read  amountparameler  in  words. 

2.  Input  File:  PROCSJN 

This  file  is  fully  described  in  Section  IL  The  only  data  taken  from  this  file  is  the 
number  of  processors  parameter. 

C.  OUTPUT  FILES 

Many  ouqiut  files  are  created  for  iiqiut  to  the  miqiping  inogram. 

1 .  On^ot  FRe:  ORDER.IN 

This  file  is  the  mapping  order  of  the  nodes.  Hie  nuqiping  occurs  in  the  order  the 
nodes  are  listed. 

a.  Node  ID 

This  is  die  node  identifier  of  die  next  node  to  enter  the  system. 

b.  Time  imto  System 

Hiis  is  the  time  when  the  node  will  be  available  to  be  mapped.  Nmmally, 
all  nodes  will  have  a  time  of  U*  which  means  all  nodes  are  available  to  be  mapped 
simultaneously  fiom  the  start  tinie. 
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2.  Output  FUe:  NODES.SNB.OUT 

This  file  is  similar  in  format  to  the  NODES  JN  file  but  also  contaias  the  calculated 
values  of  setup  and  breakdown  for  the  nodes  in  the  system  based  on  die  user  input  This 
file  is  not  used  by  die  mapper  program;  it  is  for  user  infonnadtxi  only. 

3.  Output  File:  cyliiider.out 

This  file  is  a  representation  of  die  mi^iping  of  the  cylinder.  It  is  in  the  same 
format  as  the  file  'cylinder.dat'  which  is  described  fully  as  an  input  to  die  synchronization 
arc  generator  (SAG)  program,  however,  diis  file  takes  into  account  the  possibility  of 
'wr^around'  of  the  breakdown  of  die  last  node  on  a  processor.  The  name  of  this  file 
must  be  changed  to  'cylinder.dat*  before  using  it  for  input  to  the  SAG  program. 

4.  Output  File:  cyl_stats.out 

In  this  file  are  several  percentages  to  exi»ess  the  efficiency  of  the  mapping.  Two 
sets  of  statistics  are  given.  In  the  first,  tte  largest  completion  time  over  all  processors  is 
computed  and  all  processors  ate  assumed  to  run  to  this  time  ("flat  cylinder").  The  statistics 
are  then  computed  over  the  total  processor-time  required  by  the  mapping.  This  is  the 
largest  completion  time  over  all  processors  multiplied  by  the  number  of  processes.  In  die 
second  set  ("jagged  cylinder"),  each  processor  completion  time  is  calculated  individually 
and  the  statistics  computed  for  each  processor,  the  average  is  then  taken  of  die  individual 
processor  statistics . 

a.  Control  Unit  and  Execution  Unit  Utilization  Rate 

This  refers  to  the  total  percentage  of  processor-time  that  the  specified  unit 
(control  or  execution)  is  peiforming  useful  work,  eidier  input  or  ouqiut  for  the  control  unit 
or  execution  for  the  execution  unit 

b.  Control  Unit  and  Execution  Unit  Blockage  Rate 

This  refers  to  the  total  percentage  of  processor-time  diat  die  specified  unit 
(control  (x:  execution)  is  blocked,  ie.,  the  unit  has  completed  die  ^lecific  task,  but  die  node 
cannot  switch  to  die  other  unit  as  the  other  unit  is  cunendy  busy. 
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c.  Control  Unit  and  Execution  Unit  Idle  Rate 

This  refers  to  tl»  total  percentage  of  iMYx:essor-time  Aat  die  specified  unit 
(control  or  execution)  has  no  node  assigned. 

5.  Output  File:  proc_8tats.out 

The  processor  statistics  are  outlined  in  this  file.  The  statistic  listings  are 
essentially  the  same  as  for  'cyl_stats.out'.  except  that  the  statistics  are  computed  over  one 
processor  vice  an  average  over  all  processors.  Each  processor  is  also  treated  as  a  'jagged' 
slice,  diat  is,  no  attempt  is  made  to  find  die  greatest  cmnpletimi  time  of  all  processor  slices; 
the  statistics  are  calculated  based  on  the  final  cmnpletion  time  for  each  individual  processor. 
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n.  LGDF  MAPPING  PROGRAM 


This  section  describes  a  Uoge  Grain  Data  Flow  nuq)ping  program  (referred  to  as  MAP) 
which  provides  a  detailed  multiprocessor  miy^nng  of  a  data  flow  graph  using  the  model 
described  in  [Ref.  4].  The  program  is  time  driven.  As  evmits  are  scheduled  to  occur,  the 
event  widi  die  lowest  time  stan^)  will  set  the  next  tiiiae  flag.  When  this  flagged  time  occurs, 
all  nodes  are  diecked  for  die  next  event  to  occur.  A  set  of  lists  track  liriiich  nodes  are  in  die 
various  states  of  processing.  This  mi^ping  is  (mly  ctmcemed  with  the  arithmetic 
processors  and  the  program  nodes.  Thmefore,  input  and  cmqiut  nodes  and  the  inpm/ouqwt 
inocessors  described  in  (Ref.  4]  are  not  included  in  this  mapping  program  or  associated 
datafiles.  This  program  must  be  run  priw  to  executing  the  synchnmizadon  arc  generator 
program  or  die  .simulator  program  discussed  in  Secdtms  m  and  IV,  respectively.  This 
jxogram  begins  execution  widi  the  command  'map'. 

A.  USER  INTERFACE 

The  following  ii^Nits  and  options  are  availaUe  to  the  user 

1 .  SCHEDULER  LATENCY  TIME 
COMMUNICATION  TIME  FOR  ONE  WORD 
These  inputs  were  fully  discussed  in  Sectitm  L 

2.  INTERACTIVE  INTERFACE 

The  user  can  select  vdiedier  or  not  to  use  die  interactive  interface.  The  interface 
will  allow  the  user  to  see  the  current  state  of  the  ^stem  at  any  time.  Also,  die  user  can 
adtjust  die  operation  of  the  system  by  manipulating  nodes  which  are  waiting  to  begin 
processing. 
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B.  INPUT  FILES 

1.  Input  File:  NODES.IN 

Input  FOe:  QUEUES.IN 

These  files  were  also  previously  described  in  Secti(Hi  L 

2.  Input  File:  CHA1NS.IN 

This  file  contains  die  initial  chain  infonnadoo  required  for  mapping.  The  number 
(tf  chains  paranmter  is  an  individual  element  The  remaining  parameters  exist  for  each  chain 
in  the  graph.  Note  that  this  file  is  required  to  exist  or  execudm  will  fail.  Ifdiereareno 
chains,  then  simply  have  t)‘  as  die  only  entry  in  the  file. 

a.  Number  of  Chains 

This  is  the  total  number  of  chains  in  the  system. 

b.  Chain  ID 

This  is  die  chain  identifier  number. 

e.  Chained  Nodes 

The  node  IDs  for  die  nodes  included  in  the  chain  are  listed  in  the  order  of 
chaining.  A  t)*  is  used  to  identify  the  end  of  the  node  list  for  the  chain. 

3.  Input  File:  PROCS.1N 

The  following  infimnadon  describes  the  hardware  configuration. 

a.  Number  of  Arithmetic  Processors 

This  is  die  total  number  of  arithmetic  processors  in  die  system. 

b.  Processor  Type 

The  inocessor  type  is  listed  for  die  number  of  processors  in  die  system.  For 
example,  if  diere  are  diree  processors,  the  numbers  1, 2,  and  3  will  be  listed  in  a  single 
colunm. 

4.  Input  File:  ORDERJN 

This  file  is  die  mapping  Older  of  die  nodes.  The  mapping  occurs  in  the  order  the 
nodes  are  listed.  This  file  can  be  created  manually  by  the  user  or  can  be  generated  usihg  the 
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scheduler  program.  This  file  Ls  fully  described  as  an  ouq>ut  to  die  scheduler  program  in 
Section  I. 

C.  OUTPUT  FILES 

Many  ou^t  files  are  created  for  complete  information  on  the  mtqiping. 

1.  Output  Files:  CON_EXE.OUT,CON_UNIT.OUT^XE.UNIT.OUT 

These  duee  files  provide  an  exact  mapping  of  the  nodes  on  the  processors.  The 
events  occurring  at  a  specific  time  and  die  nodes  involved  are  depicted.  A  key  to  the 
markings  is  listed  in  each  file.  Rle  'CON.fiXE.OUT  provides  a  complete  mapping  file, 
file  'EXE.UNTT.OUT  is  a  mapping  of  the  execution  units  only  and  the  file 
'CON^UNIT.OUT  is  a  mt^ping  of  die  control  units  only.  These  ou^ut  file  listings  do 
not  take  into  account  'wnq>-around'  of  die  last  node's  breakdown  time. 

In  each  file  are  several  percentages  to  express  the  efficiency  of  the  moping.  An 
important  note  about  the  statistics  is  that  diey  are  computed  over  the  total  processor-time 
required  by  the  mapping.  This  is  the  time  to  complete  the  mapping  multqilied  by  the 
number  of  processors.  The  percentages  are  therefore  essentially  an  average  of  die 
individual  processor  rates. 

a.  Processor  Vtilhatiom  Rate 

This  refers  to  the  total  percentage  of  processor-time  that  a  processor  is 
performing  some  activity  in  eitho:  the  control  unit  or  the  execution  unit 

b.  Processor  Idle  Rate 

This  refers  to  the  total  percentage  of  processor-time  that  a  processor  is  not 
performing  any  activity. 

c.  Control  Unit  and  Execution  Unit  Utilization  Rate 

This  refers  U)  the  total  percentage  of  processor-time  diat  die  specified  unit 
(control  or  execution)  is  performing  useful  w(»k,  either  input  or  outyut  for  the  control  unit 
or  execution  for  the  execution  unit 
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d.  Control  Unit  and  Execution  Unit  Blockage  Rate 

This  refers  to  the  total  percentage  of  processor-time  that  die  specified  unit 
(control  or  execution)  is  blocked,  Le.,  the  unit  has  completed  the  ^lecific  task,  but  die  node 
cannot  switch  to  the  other  unit  as  the  other  unit  is  currendy  busy. 

e.  Control  Unit  and  Execution  Unit  Idle  Rate 

This  refers  to  the  total  percentage  of  processor-time  diat  the  specified  unit 
(control  or  execution)  has  no  node  assigned. 

2.  Output  File:  SUMMARY.OUT 

This  file  summarizes  the  number  of  processors  in  particular  states  at  any  given 
time.  The  event  times  in  die  duee  previous  mapping  files  will  match  widi  diis  file.  The 
processor  utilization  percentages  are  (delayed. 

3.  Output  Files:  NODES.OUT,  PROCS.OUT,  CHAINS.OUT 

These  diree  files  provide  extremely  detailed  data  on  specific  nodes,  processors, 
and  chains.  The  lines  are  well  described  widiin  the  ou^ut  listings.  Most  of  the  items  can 
be  cross-referenced  to  other  files. 

4.  Ou^iut  File:  cylindendat 

This  file  is  a  representation  of  die  mapping  of  the  cylinder.  It  is  described  fiilly 
as  an  input  to  the  synchronization  arc  generator  (SAG)  program.  The  inclusion  of  this  file 
is  to  provide  die  data  necessaiy  to  run  SAG  based  on  the  miqiping  generated  by  this 
program  without  any  adjustments. 

D.  SELECTION  OF  THE  USER  INTERFACE  OPTION 

The  selection  of  the  user  interface  option  will  allow  the  user  to  observe  and 
interactively  change  die  mapping  as  it  progresses.  However,  once  the  mapping  has 
progressed  past  an  event,  it  is  not  possible  to  go  back  and  make  a  change.  The  interactive 
interface  is  very  descriptive.  The  user  can  view  many  aspects  of  die  system  and  make 
many  changes  during  any  pause.  Selecting  the  CONTINUE  WITH  NEXT  EVENT  will 
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allow  die  mappiiig  to  continue.  To  disctmtinue  tte  use  (tf  the  interactive  interface,  select  the 
CHANGE  INTERRUPT  STRATEGY'  followed  by  'CONTINUE  TO  CONCLUSION' 
followed  by  'CONTINUE  WITH  EVENT  options.  This  will  allow  Ae  mapping  to 
ctanplete. 
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m.  LGDF  SYNCHRONIZATION  ARC  GENERATOR 


This  section  describes  a  Large  Grain  Data  Flow  model  synchronization  arc  generator 
program  (referred  to  as  SAG).  This  program  acts  as  a  preprocessor  to  the  simulator 
program  (SIM).  Its  purpose  is  to  modify  the  input  files  to  SIM  to  be  able  to  analyze  die 
revolving  cylinder  (RC)  method  as  described  in  [Ref.  4].  SAG  makes  extensive  use  of 
linked  lists.  SAG  is  started  with  die  command 'generate'. 

A.  USER  INTERFACE 

The  user  has  a  choice  of  one  of  two  arc  generation  techniques  in  SAG.  Both 
techniques  are  described  fully  in  [Ref.  4]. 

1 .  Start  After  Finish  (SAF) 

This  selection  will  detennine  the  synchronizadon  arcs  based  on  die  start  after 
finish  technique. 

2.  Start  After  Start  (SAS) 

This  selection  will  determine  the  synchronization  arcs  based  on  die  start  after  start 
technique. 

B.  INPUT  FILES 

1.  Input  File:  node8.dat 

This  file  is  a  tabular  listing  which  completely  describes  die  no^  of  a  data  flow 
graph.  The  number  of  nodes  parameter  is  an  individual  element  The  remaining 
parameters  exist  for  each  node  in  the  graidt 
a.  Number  of  Nodes 

This  is  the  total  number  of  nodes  in  the  data  flow  graph.  This  initializes  the 
counters  necessary  to  read  in  the  node  data. 
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b.  Node  ID 

This  is  the  node  identifier  number,  which  must  be  unique  fm  each  node  in 
die  system.  Do  not  use  *0*  as  a  node  ID. 

c.  Node  Type 

This  identifies  the  type  of  node.  This  type  defines  how  die  node  will  be 
handled  in  die  programs. 

(1)  node  type  sO:  normal  node 

(2)  node  type  s  1:  input  node 

(3)  node  type  =  2:  output  node 

d.  Instruction  Size 

This  is  the  node  instrucdon  size  parameter  in  words. 

e .  Execution  Time 

This  is  die  node  execution  time  parameter  in  cycles. 

/.  Setup  Time 

This  is  the  node  setup  time  parameter  in  cycles. 

g.  Breakdown  Time 

This  is  die  node  breakdown  time  parameter  in  cycles. 

h.  Required  Processor  Type 

This  is  the  type  of  processor  required  by  the  node.  A  listing  of  TOO' 
identifies  an  iiqiut^output  (Kooessor. 

i.  Alternate  Processor  Type 

This  is  die  alternate  processor  type  to  be  used  if  die  required  processor  type 
is  unavailable.  In  most  cases,  die  alternate  is  die  same  as  the  required  prooessm*  type. 

j.  Memory  Module  Assignment 

This  is  the  memory  module  assignment  for  the  node  if  the  user  defined 
memory  assignment  opdcm  is  diosen. 
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k.  Node  Priority 

This  is  the  assignment  priority  associated  with  tte  node  if  the  user  defined 
priori^'  option  is  chosen.  A  lower  number  represents  a  higher  priority. 

2.  Input  File:  queae8.dat 

This  file  is  a  tabular  listing  which  completely  describes  the  queues  of  a  data  flow 
graph.  The  number  of  queues  parameter  is  an  individual  element  The  remaining 
parameters  exist  for  each  queue  in  the  gr^)h. 

a.  Number  of  Queues 

This  is  the  total  number  of  queues  in  the  system.  This  initializes  die 
counters  necessaiy  to  read  in  the  queue  data. 

b.  Queue  ID 

This  is  die  queue  identifier  number,  which  must  be  unique  for  each  queue  in 
dre  system.  Do  not  use  *0' as  a  queue  ID. 

c.  Queue  Type 

This  identifies  the  ^pe  of  queue.  The  type  defines  how  die  queue  will  be 
handled  in  die  programs. 

(1)  queue  type  sO:  data  queue 

(2)  queue  type  s  1:  iiqiut  queue 

(3)  queue  types  2:  output  queue 

(4)  queue  types 3:  ^nduonizadon arc 

d.  Source  Node 

This  is  die  node  ID  for  die  node  at  the  tail  of  the  queue. 

e .  Sink  Node 

This  is  the  node  ID  for  the  node  at  the  head  of  die  queue. 

/.  Write  Amount 

This  is  the  queue  write  amount  parameter  in  words. 
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g.  R«ad  Amommt 

This  is  die  queue  read  amount  parameter  in  words. 

k.  Product  Amount 

This  is  the  queue  produce  amoam  parameter  in  woids. 
i.  Coutumt  Amount 

This  is  the  queue  consume  amount  parameter  in  words. 

J.  Tkrethold  Amount 

This  is  the  queue  threshold  lunmint  parameter  in  wtmls. 

k.  Initial  Ltugtk 

This  is  the  queue  initial  lengdi  parameter  in  words. 

l.  Capacity 

This  is  the  queue  capacity  parameter  in  words. 

m.  Memory  Module  Aesigumeut 

This  is  the  memory  module  assignment  for  the  queue  if  the  user  defined 
memoiy  assignment  option  is  diosen. 

3.  Input  Fite:  niadiine.Gfg 

This  file  defines  die  system  hardware  cemfiguration. 

a.  Number  of  Memory  Moduiae 

This  is  the  number  of  memory  modules  to  be  modeled  in  the  simulator. 

b.  Number  of  Input  /  Output  Proceuon 

This  is  the  number  of  iiqiut  /  outyot  (I/O)  processors  to  be  modeled  in  die 
simulator.  NtnmaUydieieistxdyonel^inocessor. 

c.  Number  of  Aritkmetie  Procestors 

This  is  the  number  of  aridimetic  processors  in  the  tyslem. 

d.  Proceetor  Typet 

This  is  a  list  of  die  types  of  processors  defined,  with  the  number  of  elemmits 
in  the  list  equal  to  tte  number  of  processors,  excluding  I/O  processors  which  are 
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automatically  ucfined  as  '100'.  If  synchronization  arcs  without  nodes  bound  to  processors 
are  desired,  the  user  should  enter  a  *0'  for  each  element  If  however,  the  user  desires 
synchronization  arcs  generated  with  nodes  bound  to  processors,  each  element  should 
correspond  to  a  processor  type.  For  example,  if  there  are  three  processors,  the  numbers  1, 
2,  and  3  should  be  listed  in  a  column. 

4.  Input  File:  cylindendat 

This  file  is  a  representation  of  die  mi^>ping  of  nodes  on  the  processors.  If  an 
analysis  of  a  cylinder  with  no  wrq>-around  is  desired,  this  file  will  be  generated  by  the 
external  mapping  program  (MAP).  If  an  analysis  of  a  cylinder  widi  wn4>-around  is 
desired,  this  file  is  generated  by  die  scheduler  program,  after  the  filenama  is  modified  from 
'cylinder.out'. 

a.  Number  of  Nodes  on  a  Processor 

For  each  arithmetic  processtu’  in  the  system,  the  number  of  nodes  which 
used  diat  processor  are  given.  Following  die  node  total,  the  following  data  is  provided  for 
each  node  on  the  given  processor 

(1)  Node  ID 

(2)  The  node  start  time  cm  die  processor 

(3)  The  node  finidi  time  on  die  processor 

b.  Cylinder  Size 

Following  the  listing  of  die  nodes,  the  time  to  cmnplete  the  cylinder  slice  is 
given.  This  is  equal  to  die  longest  processor  busy  time  of  all  the  processors  in  the  system. 

C.  OUTPUT  FILES 

Many  output  files  ate  oeated  for  complete  information  on  the  mtqiping. 

1.  Output  File:  queue8.dat 

This  file  has  the  same  format  as  described  previously  for  'queues.dat'. 
However,  synchronization  arcs  have  been  tqipended  to  tte  end  of  the  file  as  determined  by 
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this  {tfogram.  This  adjusted  'queues.dat'  fife  may  now  be  used  by  the  simulator  to  analyse 
die  revolving  cylinder  (RC)  scheduling  ledmiques. 

2.  Output  File:  oqueucs^t 

This  file  is  a  ct^y  of  die  original  'queues.dat'.  Since  this  program  modifies  the 
file  'queues.dat',  this  file  will  allow  for  easy  recovery  back  to  the  original  graph 
desciiptioiu  prior  to  die  addition  of  synduonization  arcs. 

3.  Ou^t  File:  iudezeyLout 

This  file  provides  die  same  informatitm  as  die  'cylinder.dat'  file.  In  addition,  the 
tqiprcqiriaiB  index  for  the  node  is  provided  as  described  in  [Ref.  4]. 

4.  Output  File:  token8.out 

This  fik  lists  inqKntant  infoanatioD  about  die  synchronization  arcs,  including  the 
source  node,  sink  node,  initial  length  (numbm'  of  tokens),  threshold  amount,  consume 
amount,  and  produce  amount 

5.  Tenq^nury  FDe:  rqnenes.tmp 

This  file  is  a  temponuy  fik  created  duiiitg  execution  windi  will  provide  no  useful 
infocination  to  the  user. 


rv.  LGDF  SIMULATOR 


This  secti(ni  describes  a  simulator  (lefened  to  as  SIM)  for  a  Large  Grain  Data  Flow 
model  described  in  [Ref.  4].  SIM  is  an  event>driven  program  that  makes  extensive  use  of 
linked  lists.  SIM  is  started  with  the  command  ‘simulate'. 

A.  USER  INTERFACE 

There  are  many  inputs  and  optimis  available  to  the  user.  They  are  presented  below 
exacdy  as  diey  ^)pear  in  die  program. 

1.  COMMENT  LINE 

This  is  a  comment  which  will  be  diqdayed  at  the  head  of  the  data  set  in  the 
statistics  file  to  enable  die  user  to  easily  distiqgnish  the  file  output  Results  from  successive 
execudons  of  SIM  can  be  dumped  to  a  single  file  without  losiitg  tradt;  of  the  data  sets. 

2.  THE  INSTANCE  NUMBER  TO  START  GATHERING  RESULTS 
This  is  die  iryut  instance  of  die  gryh  to  start  gathering  tfarouglyut  and  udlizatioo 

results  frcnn  die  sunuladcm. 

3.  THE  INSTANCE  NUMBER  TO  TERMINATE  THE  SIMULATION 
This  is  die  ouQiut  instance,  whidi  when  cmnpleted,  will  terminaie  the  sunuladcm. 

4.  SCHEDULER  LATENCY  TIME  (cydea) 

This  is  scheduler  latency  for  any  queue  variations  in  die  scheduler  internal 
memory.  This  could  be  die  time  taken  by  the  sdieduler  to  manipulate  its  internal  data 
structures. 

5.  COMMUNICATION  TIME  FOR  ONE  WORD  (cycles) 

This  is  die  time  to  transmit  one  word  of  data  between  a  memcuy  unit  and  a 
processor  across  die  data  transfer  network. 
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6.  DATA  RATE  OPTION 

Two  (^<»s  are  available: 

a.  User  Defined 

Tbe  user  will  be  prompted  for  fuitfaer  iiq>ot  of  the  time  intenral  which  will 
pass  after  the  ii^ut  for  one  gn4)h  iteratitm  are  entered  into  the  system  until  die  input 
data  for  the  next  gr^jih  iteradtxi  are  entered  into  the  system.  The  pnxnpt  seen  by  die  user 
is:  ENTER  THE  DATA  PERIOD  BEFORE  THE  NEXT  GRAPH  ITERATION  (cycles). 

b.  Maximum  Throughput 

The  simulator  wiU  generate  data  fw  cmisecudve  graph  iteradmis  to  insure 
diat  the  input  queue  is  constandy  filled.  This  will  drive  the  machine  at  its  maximum 
throughput  This  efifecdvely  permits  the  user  to  determine  die  iqiper  bound  in  the  input  data 
rate  for  die  givoi  configuration. 

7.  MEMORY  MAPPING  OPTIONS 

Two  optkms  are  available: 

a.  User  Defined  Mapping 

This  option  will  nuqi  nodes  and  queues  to  memmy  modules  as  defined  in 
die  nodes.dat  file. 

b.  Arbitrary  Mapping 

The  ^ulator  will  arbitiatily  assign  nodes  and  queues  to  memmy  modules. 

8.  NODES  ON  READY  LIST  OPTION 

Two  options  are  available: 

a.  Only  One  Node  Instance  can  be  on  Ready  list 

Only  one  instance  of  a  node  can  be  maintained  cm  the  ready  list  at  any  given 

time. 

b.  Mult^ie  Node  Instances  can  be  on  Ready  list 

Multqile  instances  of  a  node  can  be  maintained  cm  die  rea^  list  at  any  given 
time.  However,  cmty  one  instance  of  die  node  can  be  inocessing. 
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9.  NODES  EXECUTION  PRIORITY  OPTIONS 

Seveial  options  are  available  to  place  nodes  in  the  ready  list: 

a.  No  Priority 

Nodes  are  executed  on  a  First-Come-Brst-Served  (PCFS)  basis,  i.e., 
according  to  the  oido-  in  which  they  are  ready. 

b.  User  Defined 

The  node  priorities  are  as  defined  in  die  file  'nodes.dat'  This  allows  the 
user  to  designate  critical  nodes  to  be  assigned  to  a  processor  immediately  when  data  is 
available. 

e.  Shortest  Execution  Time  First 

A  ready  node  with  a  shorter  execution  time  will  be  assigned  before  a  reacty 
node  widi  a  longer  executitm  time. 

d.  Longest  Execution  Time  First 

A  ready  node  with  a  longer  execution  time  will  be  assigned  beftne  a  ready 
node  with  a  shorter  execution  time. 

B.  INPUT  FILES 

Three  ii^iot  files  are  required  by  the  simulator. 

1.  Input  File:  nodeadat 

The  contents  of  this  input  file  are  described  fiilly  as  an  input  to  the 
synchronization  arc  generator  program  in  Section  4. 

2.  Input  File:  queueadat 

The  contents  of  this  input  file  are  described  fiilly  as  an  input  to  the 
^nchronization  arc  generator  program  in  Section  4. 

3.  Input  File:  nucliine.cfg 

The  contents  of  this  input  file  are  described  fiiUy  as  an  input  to  the 
synchronization  arc  generator  program  in  Section  4. 
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C.  OUTPUT  FILES 

Three  ou^ut  files  are  generated  by  this  i»ogram. 

1.  Output  Files:  starts.out 

This  is  a  listing  of  the  grqrfi  instance  and  dte  start  times  of  those  instances  being 
measured. 

2.  Ou^ut  File:  eiidtime8.out 

This  is  a  listing  of  the  graph  instance  and  end  times  of  those  instances  being 
measured. 

3.  Output  Files:  stats.out 

This  file  summarizes  the  data  from  a  given  amulatitm.  The  same  information  is 
displayed  to  the  standard  ou^ut  upon  program  completion.  Note  that  this  file  is  an 
appended  file,  so  addititxial  simulation  results  are  added  to  the  end  of  the  file  n^iich  enables 
easier  comparison  of  multiple  tests.  The  following  data  is  provided: 

a.  COMMENT 

The  ctxnment  line  iI^)ot  by  the  user. 

b.  DATA  RATE  OPTION 

The  number  coneq)onding  to  the  choice  made  at  die  start  of  dc  [jogtam. 

c.  MEMORY  MAPPING  OPTION 

The  number  correqiondiiig  to  the  dioice  made  at  die  start  of  die  program. 

d.  NODES  ON  READY  UST  OPTION 

The  number  coireqxndiQg  to  die  dKnoe  made  at  the  start  of  die  program. 

e.  NODES  EXECUTION  PRIORITY  OPTION 

The  number  ctnre^KHidiQg  to  die  choice  made  at  the  start  of  the  program. 

/.  START  INSTANCE 

The  dataflow  grqih  instance  where  measurements  were  started. 

g,  END  INSTANCE 

The  dataflow grqih  instance  wfaidi  teiminaied  the  simulatkm. 
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h.  SCHEDULER  LATENCY  TIME  (cycles) 

The  scheduler  latency  for  any  queue  adjustment  in  the  scheduler  internal 


memory. 


i.  COMMUNICATION  TIME  FOR  ONE  WORD  (cycUs) 

The  communication  time  to  transfer  one  word  of  data  between  a  memory 
module  and  a  processor. 

J.  ITERATION  DATA  PERIOD  (cycUs) 

The  time  differential  fcM*  die  input  of  consecutive  data  flow  graph  iterations, 
or  the  statement  'MAX  THROUGHPUT  if  the  maximum  throughput  input  option  was 


chosen. 


k.  PROCESSOR  DATA 

For  each  processor  in  die  system,  the  following  data  is  provided: 

(1)  ID -the  processor  identifier 

(2)  TYPE -the  processor  type  (100  refers  to  I/O  processors) 

(3)  UTIUZAIIION- the  overall  processor  udlizadra  rate 

(4)  EXECUTION- the  udlizadon  rate  of  the  execution  unit 

(5)  EXE/UTIL  -  the  amount  of  execution  as  part  of  the  overall 


utilizatirm 


execution 


(6)  UTTL-EXE  -  the  amount  of  communication  not  overiapped  with 


/.  AVERAGE  PROCESSOR  UTIUZATION 

The  average  amount  over  all  aridimetic  processors  (excluding  I/O)  of 
processor  utilization  during  the  period  measurements  are  taken, 
m.  AVERAGE  PROCESSOR  EXECUTION 

The  average  amount  over  all  aridimetic  processors  (excluding  I/O)  of 
executitm  unit  utilization  during  the  period  measurements  are  taken. 
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n.  AVERAGE  EXECUTION  I  VTIUZATION  RATE 

The  average  amount  over  all  arithmetic  processors  (excluding  I/O)  of 
execution  unit  utilization  as  a  portion  of  total  utilization  during  the  period  measurements  are 
taken. 

o.  AVERAGE  NON-OVERLAPPED  COMMUNICATION  RATE 
The  average  amount  over  all  arithmetic  processors  (excluding  I/O)  of 

communication  not  overlapped  with  execution  during  the  period  measurements  are  taken. 

p.  NORMAUZED  DATA  RATE 

The  rate  of  input  into  the  system  compared  to  the  optimum  execution 
completion  time.  A  value  of  *0'  means  the  optimum  throughput  option  was  chosen. 

g.  SIMULATION  TIME  (cycles) 

The  total  time  in  cycles  for  the  simulation  to  run  to  completion. 

r.  AVERAGE  RESPONSE  TIME  (cycles) 

The  average  lengtib  of  time  over  all  measured  graph  instances  for  a  gnq>h 
instance  to  complete. 

s.  AVERAGE  ThROUGEPUT  (Instances  per  Megacycle) 

The  average  number  of  data  flow  graphs  to  be  completed  per  million  cycles 
during  the  time  measurements  are  taken.. 

INSTANCE  LENGTH  STANDARD  DEVIATION 

The  standard  deviation  of  die  completion  time  of  the  measured  instances. 

u.  COEFFICIENT  OF  VARIATION 

The  instance  length  standard  deviation  divided  by  die  average  response  time. 

V.  I/O  COMMUNICATION  TIME  FOR  ONE  GRAPH  INSTANCE 
The  required  communication  time  (in  cycles)  for  one  data  flow  griqih 
instance  which  occurs  on  die  I/O  processors.. 
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w.  I/O  CALCULATION  TIME  FOR  ONE  GRAPH  INSTANCE 
The  required  calculation  time  (in  cycles)  for  one  data  flow  grq}h  instance 
which  occurs  on  the  VO  processors.. 

jc.  NODE  COMMUNICATION  TIME  FOR  ONE  GRAPH 
INSTANCE 

The  amount  of  communication  time  (in  cycles)  which  is  related  to  the  graph 
nodes  (setup  latency,  breakdown  latency,  and  die  time  to  load  die  instruction),  excluding 
that  which  occurs  on  the  I/O  processors. 

y.  QUEUE  COMMUNICATION  TIME  FOR  ONE  GRAPH 
INSTANCE 

The  amount  of  communication  time  (in  cycles)  which  is  related  to  the  gr^ih 
queues  (reading  and  writing  data),  excluding  that  udiich  occurs  (» the  I/O  processtxrs. 

z.  COMMUNICATION  TIME  FOR  ONE  GRAPH  INSTANCE 
The  required  communication  time  (in  cycles)  of  one  data  flow  graph 

instance.  This  does  not  include  the  communication  time  and  control  time  for  input  and 
output  nodes  and  queues. 

aa.  CALCULATION  TIME  FOR  ONE  GRAPH  INSTANCE 

The  required  calculation  time  (in  cycles)  of  one  data  flow  gnqih  instance. 
This  does  not  include  the  calculation  time  for  input  and  ouqiut  nodes. 

ab.  IDEAL  CYUNDER  COMMUNICATION  OF  ONE  INSTANCE 
The  amount  of  communicadon  time  (in  cycles)  which  would  be  equally 

divided  among  die  arithmetic  processors.  This  does  not  include  the  communication  which 
occurs  on  die  VO  processors. 

ac.  IDEAL  CYUNDER  CALCULATION  OF  ONE  INSTANCE 

Hie  amount  of  calculation  time  (in  cycles)  which  would  be  equally  divided 
among  the  arithmetic  processes.  This  does  not  include  the  calculation  v^iich  occurs  (xi  the 
I/O  processors. 
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ad.  COMMUNICATION/CALCULATION  RATIO 


The  ratio  of  the  communication  time  to  the  computation  time  for  one 
instance.  This  does  not  include  the  communication  and  calculation  which  occurs  on  the  I/O 
processors. 
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APPENDK  C:  SAMPLE  INPUT  DATA  FILES 


NODES.IN 

20 

101  0  0  50000  0  6 

102  0  0  50000  0  3 

103  0  0  150000  0  7 

104  0  0  150000  0  8 

105  0  0  100000  0  4 

106  0  0  100000  0  5 

107  0  0  1000000  0  1 

108  0  0  50000  0  1 

109  0  0  400000  0  7 

110  0  0  1000000  0  2 

111  0  0  400000  0  8 

112  0  0  75000  0  2 

113  0  0  1000000  0  3 

114  0  0  1000000  0  4 

115  0  0  1000000  0  5 

116  0  0  50000  0  8 

117  0  0  800000  0  6 

118  0  0  50000  0  6 

119  0  0  100000  0  7 

120  0  0  100000  0  8 
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QUEUES.IN 

25 


1 

0 

101 

16384 

16384 

2 

0 

102 

16384 

16384 

3 

101 

103 

16384 

16384 

4 

102 

104 

16384 

16384 

5 

103 

105 

16384 

16384 

6 

104 

106 

16384 

16384 

7 

105 

107 

4096 

4096 

8 

106 

108 

4096 

4096 

9 

107 

109 

4096 

4096 

10 

108 

no 

4096 

4096 

11 

109 

112 

4096 

4096 

12 

109 

113 

4096 

4096 

13 

no 

111 

4096 

4096 

14 

111 

112 

4096 

4096 

15 

111 

114 

4096 

4096 

16 

112 

115 

4096 

4096 

17 

113 

116 

4 

4 

18 

114 

116 

4 

4 

19 

115 

117 

2052 

2052 

20 

116 

117 

4 

4 

21 

117 

118 

513 

513 

22 

117 

120 

513 

513 

23 

118 

119 

513 

513 

24 

119 

0 

513 

513 

25 

120 

0 

513 

513 
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nodM.dat 


22 


101  0  0 
102  0  0 

103  0  0 

104  0  0 

105  0  0 

106  0  0 

107  0  0 

108  0  0 
109  0  0 
no  0  0 
111  0  0 
112  0  0 

113  0  0 

114  0  0 

ns  0  0 

116  0  0 

117  0  0 

118  0  0 

119  0  0 

120  0  0 
1  1  0 
2  2  0 


SOOOO 

50000 

ISOOOO 

150000 

100000 

100000 


50000 


^rmT 

wW 


75000 


7TTTTT 

TTTTTT 

n  1 1  n 


50000 

800000 

50000 

100000 

100000 

0 

0 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


0  6  0  6  0 

0  3  0  3  0 

0  7  0  7  0 

0  8  0  8  0 

0  4  0  4  0 

0  5  0  5  0 

0  10  10 
0  10  10 
0  7  0  7  0 

0  2  0  2  0 

0  8  0  8  0 

0  2  0  2  0 

0  3  0  3  0 

0  4  0  4  0 

0  5  0  5  0 

0  8  0  8  0 

0  6  0  6  0 

0  6  0  6  0 

0  7  0  7  0 

0  8  0  8  0 

0  100 100  6  0 
0  100 100  6  0 
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queues.dat 

25 


1 

1 

1 

101 

16384 

16384 

16384 

16384 

16384 

0 

131072 

11 

2 

1 

1 

102 

16384 

16384 

16384 

16384 

16384 

0 

131072 

4 

3 

0 

101 

103 

16484 

16384 

16384 

16384 

16384 

0 

131072 

7 

4 

0 

102 

104 

16384 

16384 

16384 

16384 

16384 

0 

131072 

5 

5 

0 

103 

105 

16384 

16384 

16384 

16384 

16384 

0 

131072 

8 

6 

0 

104 

106 

16384 

16384 

16384 

16384 

16384 

0 

131072 

6 

7 

0 

105 

107 

4096 

4096 

4096 

4096 

4096 

0 

32768 

1 

8 

0 

106 

108 

4096 

4096 

4096 

4096 

4096 

0 

32768 

3 

9 

0 

107 

109 

4096 

4096 

4096 

4096 

4096 

0 

32768 

10 

10 

0 

108 

110 

4096 

4096 

4096 

4096 

4096 

0 

32768 

2 

11 

0 

109 

112 

4096 

4096 

4096 

4096 

4096 

0 

32768 

2 

12 

0 

109 

113 

4096 

4096 

4096 

4096 

4096 

0 

32768 

3 

13 

0 

no 

111 

4096 

4096 

4096 

4096 

4096 

0 

32768 

9 

14 

0 

111 

112 

4096 

4096 

4096 

4096 

4096 

0 

32768 

1 

15 

0 

111 

114 

4096 

4096 

4096 

4096 

4096 

0 

32768 

4 

16 

0 

112 

115 

4096 

4096 

4096 

4096 

4096 

0 
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APPENDK  D:  SAMPLE  RUN  OF  PROGRAMS 


This  appendix  outlines  the  general  procedure  for  running  a  simulation  session  with  die 
programs  in  diis  thesis  and  [Ref.  4].  Input  and  ou^ut  files  will  be  indicated  in  bold. 
Commands  required  to  run  programs  will  be  indicated  in  italics.  Consult  die  user’s 
manual  (>^)pendix  B)  for  detailed  descrqitions  of  the  program  iiqiut  and  ouqnit  files.  Note 
that  all  ouqiut  files  are  opened  for  writing  (except  stats.out)  during  program  execution. 
This  means  diat  file  names  must  be  modified  when  running  successive  iterations  of  the 
same  (vogram  or  data  will  be  lost 

1.  Modify  the  NODES.IN.  QUEUES.IN.  PROCS.IN,  and  CHAINS.IN  files 
for  the  gnqih  to  be  analyzed. 

2.  Run  the  node  allocation  program  by  typing  schedule  and  entering  die  proper  input 
data.  This  program  will  generate  the  ORDER.1N  file  and  the  cyliiider.oat  file. 

3.  Run  the  m^per  program  by  typing  map  and  entering  die  proper  input  data.  Hiis 
will  generate  the  qrliiider.dat  file  and  odier  descriptive  outyut  files. 

There  are  now  several  different  steps  to  be  taken,  depending  on  whether  it  is  desired  to 
analyze  the  FCFS  or  RC  technique  and  dependent  on  whedwr  wrap-around  or  no  wr^> 
around  is  desired.  Each  of  the  techniques  sad  variations  will  be  discussed  separatety. 

A.  FCFS 

Modify  the  nodes.dat,  qiieiies.dat,  and  mactaine.cfg  files  for  the  graph  to  be 
analyzed.  The  cyliiider,dat  file  must  also  be  present  for  the  program  to  run,  aldiough  it 
is  not  necessary  to  be  concerned  about  the  wrtqi-atound  or  no  wnqi-around  option  since  the 
simulator  does  not  use  this  file  for  FCFS.  Run  the  simulator  program  by  typing  simulate 
and  enter  die  proper  inimt  data. 
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B.  RC  TECHNIQUES 

For  the  various  RC  techniques,  synchrcHiization  arcs  must  be  generated  before  the 
simulator  program  is  run. 

1.  Start-after-start  (SAS)  or  start-after-finisii  (SAP)  without  binding 
nodes  to  processors. 

The  madiinexfg  file  must  contain  a  zero  for  each  processor  assigned. 

a.  No  wrap-around 

Use  Ae  qriinder.dat  file  from  the  m^per  program. 

b.  Wrap-around: 

The  cylinder.out  file  must  be  renamed  to  cylinder.dat  Ensure  the 
cylinder.dat  file  is  renamed  first  or  it  will  be  overwritten. 

Run  die  synchronization  arc  program  by  typing  generate.  Select  the  technique 
desired.  If  it  is  desired  to  generate  anothm*  set  of  synchronizatitm  arcs  for  a  different 
technique  (Le.,  SAF  arcs  are  generated,  and  SAS  is  now  desired)  the  quenes.dat  file 
must  be  renamed  (to  something  iqipropriate,  e.g.,  qnenes.SAF)  and  the  oqueoes.dat 
file  renamed  to  qDeoes.dat  The  generate  program  can  now  be  tun  for  the  new  technique. 

2.  Start-after-start  (SAS)  or  start-after-finish  (SAF)  with  binding 
nodes  to  processors. 

For  diis  technique,  die  madiinexfg  file  must  now  contain  a  number  for  each 
processor  assigned  (Le.,  1, 2,  etc.).  The  same  rules  tqiply  as  above  for  generation  of  arcs 
for  wnq>-around  and  no  wnq>-around  techniques. 

C.  SIMULATIONS 

For  the  simulations  it  is  important  to  maintain  die  proper  input  files.  Ensure  the 
qoeoes^lat  file  matdies  die  tqipropriate  qiinderuiat  file,  Le.,  wrap  or  no  wrap  and  dud 
die  niadiine.cfg  file  corre^nds  to  die  desired  binding  or  no  binding  condition.  As  an 
example,  say  die  iiqnit  files  were  originally  named  (after  generating  die  synchronization 
arcs)  queoes.SAFnW  (no  wrap),  queues. SAFbnW  (bound,  no  wrap). 
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queues.SAFiiW  (with  wrap),  queues.SAFbW  (bound,  with  wrap),  cylinder.W 
(with  wrap),  cylinder.nW  (no  wrap),  iiiachine.cfgb  (bound),  and  machine.cfgnb 
(not  bound).  In  order  to  run  a  simulation  for  die  no  wrap,  non-bound  configuration,  the 
files,  queues.SAFnW,  cylinder.nW,  and  machine.cfgnb  must  be  renamed  to 
queues.dat,  Qrlindendat,  and  nuidiine.cfg.  The  simulator  program  can  now  be  run 
by  typing  simulate.  Remember  that  the  three  files  named  above  mu^  be  renamed  again  in 
order  to  simulate  new  RC  techniques. 
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